Background

A current focus of health care systems is to improve the cost and quality of patient care. The American College of Surgeons National Surgical Quality Improvement Program (NSQIP®) is commonly used to collect and report data on institution-specific, risk-adjusted surgical outcomes. A systematic sampling approach is used to determine which surgical cases are selected for abstraction (Shiloach et al. 2010) based on the hospital’s specific program: Essentials, Procedure-Targeted, Small & Rural, or Pediatric. Participating institutions are provided with quarterly reports on risk-adjusted complication rates for a variety of postoperative occurrences including surgical site infections, renal failure, thromboembolic complications, cardiac events, readmission, and observed to expected (O/E) mortality. Institutional performance for specific complications are compared with other hospitals and assigned a decile rank. The ranking is reported as “Needs Improvement,” “As Expected,” or “Exemplary,” when compared with expected complication rates using standardized models, such as the NSQIP General Vascular Mortality Model (GVMM) for O/E mortality. Institutions can then use the NSQIP’s institution-specific benchmarking data to focus their quality improvement initiatives. NSQIP participation is effective in helping institutions identify potential problems in surgical care (Steinberg et al. 2008; Fink et al. 2002); however, in most instances, additional analyses by the participating hospitals are required to develop a better understanding of how best to prevent or decrease complications.(Schilling et al. 2008).

In 2014, our institution received a NSQIP report indicating a higher than expected observed-to-expected 30-day mortality rate and was subsequently assigned a “Needs Improvement” status. We routinely review all major surgical complications and mortalities through our surgical Morbidity and Mortality conference and were surprised to learn we ranked in the lowest decile for this category. Focusing on mortality, we first reviewed the surgical literature to aid in identifying factors that might impact predicted mortality (Fink et al. 2002) and then did chart reviews of all surgical mortalities in 2014. We initially abstracted data to get a better understanding of patient-specific risk factors and process-of-care variables that might affect mortality including transfer status, need for emergency surgery, use and timing of “do not resuscitate” (DNR) status, “procedure risk” (low, medium, or high), NSQIP and University Hospital Consortium predicted mortality, and finally the American Society of Anesthesia (ASA) physical status classification system.

Based on our initial review, we turned our focus to the ASA score (Table 1) (Durham et al. 2006) as a potential contributor to our higher than expected O/E mortality rate. The ASA score is assigned by the anesthesia team and provides a baseline metric for the fitness of a patient prior to undergoing surgery. The ASA score is an important predictor of mortality in surgical patients (Davenport et al. 2006; Davenport et al. 2005) and has been specifically validated for use in the NSQIP GVMM. The NSQIP rules for data entry require the SCR to use the ASA score recorded by the anesthesia team but allow the SCR to add the suffix E in cases where the surgical team documents the emergent nature of the surgical procedure, if not already documented in the “anesthesia assigned” ASA score. During our chart review of the 2014 mortalities, we identified several misclassified ASA scores, which greatly altered predicted mortality according to the NSQIP online preoperative risk calculator. Based on this finding, we hypothesized misclassified ASA scores falsely decreased the expected mortality and contributed to the increased 2014 NSQIP O/E mortality at our institution.

Table 1 American Society of Anesthesiologists physical classification system (Durham et al. 2006)

Methods

The study was approved by our institutional review board for exemption from review because it used retrospective, de-identified data. At our institution, 1264 general and vascular surgical cases were reported to NSQIP in 2014, which included 33 mortalities. The medical records of these 33 patients were reviewed by two surgery residents (AH and SJ) independently, who did not participate in any of the cases. ASA score was independently calculated by each reviewer and then discussed together to reach consensus based on published guidelines on the ASA website (asahq.org). In addition to ASA score, the following data were abstracted: transfer from another institution, the need for emergency surgery, DNR status and timing relative to death, and procedure risk. To determine factors significantly affecting predicted mortality, all patient factors that are part of the NSQIP online calculator were reviewed including procedure, age, gender, functional status, emergency status, wound class, steroid use, ascites present within 30 days of surgery, systemic sepsis present within 30 days of surgery, diabetes, hypertension, previous cardiac events, congestive heart failure, dyspnea on presentation, tobacco use, history of COPD, dialysis, acute renal failure, ventilator dependence, disseminated cancer, body mass index, and reclassification of the ASA score using the American Society of Anesthesiologists published guidelines (Table 1) (Durham et al. 2006). This analysis determined ASA was the major factor in NSQIP modeling, and discrepancies in classification lead to substantially different outcomes, specifically changes from ASA 3 to ASA 4. Changes from ASA 2 to ASA 3 or ASA 4 to ASA 5 did not impact mortality predictions as expected.

To understand the impact of ASA misclassification, we needed to develop a global estimate of ASA misclassification incidence for the entire NSQIP population, not just the mortalities. As the differential impact between ASAs 2 and 3, and between 4 and 5, was negligible compared to that between ASAs 3 and 4, our study focused on ASA 3 cases only. To objectively estimate the incidence of misclassified ASA 3 patients, a random sample of 50 patients was selected from the 2014 elective NSQIP surgical cases with charted ASA 3 classifications. Additionally, 10 patients of the total 74 emergency cases who were initially charted ASA 3 were randomly selected. These samples were used to estimate the frequency of ASA 3 over- and under-classification. Patient selection was performed by random number generating software (SAS 9.4, Cary, NC).

After correcting the misclassified ASA scores of the random samples, SAS 9.4 was used to simulate the number of expected deaths using the odds ratios for mortality in the published 2014 GVMM and adjusted for the new rates of each ASA class. Both over- and under-classifications were included in the model. The simulation was run 1000 times. As the entered data represented the probabilities of a particular outcome, each run of the simulation generates a number of predicted deaths.

Statistical analyses were performed using JMP 10 and SAS 9.4 (Cary, NC). Contingency analysis using the chi-square test was performed for categorical variables.

Results

The patient characteristics of our study populations (all cases and mortalities) are shown in Table 2. Patients who died were older and more likely to be outside transfers and emergencies (p < 0.05). In addition, certain surgical populations, namely, breast and endocrine surgery patients, had no observed mortality. Characteristics such as gender, race, and Hispanic ethnicity were not different between groups. When evaluating NSQIP variables, the mortality group also had a greater incidence of partially dependent functional status, disseminated cancer, diabetes, hypertension, tobacco use, chronic obstructive pulmonary disease, acute renal failure, dirty wounds, ascites, and ventilator use at the time of surgery (Table 3, p < 0.05). A total of 1264 NSQIP essential cases were performed during 2014. Patients transferred from other institutions were more likely to be emergency cases compared with patients who were not transfers (43.5 vs. 7.84%, p < 0.05), and transferred patients had a higher proportion of ASAs 3–5 vs. ASAs 1–2 compared with non-transfers (84.38 vs. 49.76%, p < 0.05). When comparing our study population to the NSQIP reported total population (n = 768,612), our study population had a higher proportion of transferred patients for mortalities (57.6 vs. 29.9%, p < 0.05) and survivors (11 vs. 3.9%, p < 0.05). Additionally, our study population had a higher number of patients with 3+ risk factors in both mortalities (87.9 vs. 60.1%, p < 0.05) and survivors (23 vs. 11.6%, p < 0.05) than the NSQIP comparison population.

Table 2 Patient Characteristics of the Study Population
Table 3 NSQIP risk factors in the study population

Our initial medical record review of 33 mortalities showed 18.2% of ASA scores were misclassified, mostly in patients originally scored ASA 3 or ASA 4. 12.1% were under-classified (initially received a lower ASA), and 6.1% were over-classified. Discrepancies between recorded and reclassified ASA scores appeared to be the greatest contributor to the NSQIP predicted mortality (compared to all other factors on the online NSQIP model) particularly when ASA 4 and 5 cases were under-classified as ASA 3. Emergency cases were more likely to have a higher ASA score (p < 0.05) and were more likely to be misclassified (p < 0.05). Table 4 lists the reclassified ASA scores and the medical rationale for reclassifying the ASA score.

Table 4 Reasons for ASA misclassification in the study population

To determine if ASA scores were also systematically misclassified in the all cases population, we reviewed medical records of 50 randomly chosen survivors initially assigned an ASA score of 3. In this random sample, the ASA was misclassified in 16% of patients with five under-classified and two over-classified. A random sample of 10 patients who underwent emergency surgery was also analyzed. Six of these patients had ASA scores that were misclassified (p < 0.05) including one over-classification. Table 4 summarizes the factors that led to ASA reclassification.

Predicted mortality simulations were then performed using the misclassification rates discovered above. Figure 1 shows the distribution of the results of the simulation model for the ASA 3 misclassifications. Ultimately, the simulation shows the number of predicted deaths was 33.46 (95% CI 33.44–33.48), determined by the median of the curve generated by the histogram. Based upon the simulated estimated deaths, our O/E mortality rate would be 0.9864 (95% CI 0.9858–0.9871), essentially identical to the number of observed deaths in 2014 (33 deaths), whereas the NSQIP report only predicted 30.6 deaths (O/E mortality 1.0784).

Fig. 1
figure 1

Simulated predicted death adjusting for ASA misclassification. When adjusting for re-classified ASA scores in the sample populations, simulations using the odds ratio of mortality based on the GVMM reports predicted increased mortality matching our institutions observed mortality. Histogram bars depict the percentage of simulations that resulted in the mortality rates shown on the x-axis. The number of deaths with the greatest likelihood based on the simulation model was 33.5. The simulation was run 1000 times. Both over-classification and under-classification rates were included in the model

Discussion

Quality improvement initiatives such as the NSQIP and the University Hospital Consortium (UHC) databases are important benchmarking resources, which allow participating hospital systems to assess the quality of care provided at their institutions (Fink et al. 2002). However, when specific quality scoring systems are used to evaluate patient care, it is assumed that the data used to assess patient acuity and outcomes are accurate. The UHC quality benchmarking process utilizes administrative data to “risk-stratify” patient outcomes. Administrative databases require accurate documentation of patient’s medical diagnoses in the medical record to accurately “risk-stratify” patient outcomes. Abstracted databases like NSQIP are assumed to be more accurate in patient risk factor stratification (Steinberg et al. 2008). However, as with any “scoring system,” the users must understand the strengths and weakness of the system to use it properly. NSQIP is designed to predict the risk of complications and mortality based on information about the patients and their medical conditions that is available prior to performing surgery. Consequently, misrepresentation of these variables, such as failure to recognize sepsis or misclassifying the ASA, can significantly underestimate predicted mortality and thereby inaccurately categorize hospitals as being poor performers.

We discovered that at our institution, an academic medical center with many trainees, ASA misclassifications were relatively common (16%), especially in emergencies (60%). Under-classification of ASA scores 4 and 5 as ASA 3 was unexpected. The statistical model we created suggests that with proper ASA classification, our institution’s predicted mortality would have matched our observed mortality. In addition, emergency surgical procedures were most commonly misclassified and transferred patients were most often emergency cases. Thus, the predicted mortality of institutions with a high volume of transferred emergency surgical cases, such as ours, may be artificially reduced from underestimated ASA classifications. This is consistent with data suggesting that emergency cases are prone to high O/E ratios and risk under-classification in general (Hyder et al. 2015).

The reasons why so many ASA scores were misclassified are unclear. As a teaching hospital, it is tempting to assume that junior anesthesia trainees were not as familiar with the ASA score calculation as they should be, or that emergency cases performed on nights and weekends were prone to “erroneous ASA classification” (Gawande et al. 2003). However, Tables 3 and 4 offer some additional insights. First, many patients who were misclassified as ASA 3, when they should have been ASA 4, presented with sepsis or perforated viscus. In reviewing their charts, some of these patients appeared quite well on initial examination; however, their vital signs, laboratory values, and imaging met the Systemic Inflammatory Response Syndrome criteria for sepsis. In addition, all of the ASA 5 patients misclassified as ASA 3 presented with ruptured abdominal aortic aneurysm. On presentation and initial exam, because they were “contained ruptures,” these patients also appeared quite well with minimal abnormalities in their vitals or laboratory values. However, while these patients were clinically well appearing, ruptured aortic aneurysms are by definition granted an ASA 5 on the ASA guidelines, as the natural history for these cases is quick propagation to overt rupture and death. A few under-classified patients had medical histories consistent with ASA 4 as well, such as myocardial infarction within 3 months or ongoing transient ischemic attacks prior to surgery. As such, there appeared to be an over reliance on the subjective physical appearance of the patient at the time of examination, as opposed to factors such as the natural history of their disease process and previous medical history.

The ASA score is a subjective measure of baseline patient illness; however, it is critical that all providers who assign ASAs are doing so in a consistent manner. Our study provides evidence ASA misclassification can significantly impact predicted mortality and supports continued education regarding the potential impact of ASA scoring on O/E mortality in surgical patients. To address this concern, we communicated our findings on ASA misclassification to both the surgery and anesthesia departments and provided education regarding the ASA score and its importance in calculating predicted mortality. We also modified our institutional procedure verification and time out policy by incorporating the ASA score into the surgical time out. Our revised policy requires the attending anesthesiologist to communicate the ASA score to the surgical team as part of the time out. The attending surgeon is required to acknowledge the assigned ASA score prior to starting the operation and initiate a discussion about it if there are concerns about the assigned score. In addition, surgical residents are required to use the online NSQIP calculator for all cases presented in the weekly morbidity and mortality conference. A future study will reevaluate the incidence of misclassified ASAs after such education has been instituted.

There are several limitations to this study. First, it is a single-center retrospective review of NSQIP mortalities in a large academic medical center. Consequently, the results may not apply to participating NSQIP institutions of varying size and type. Future studies being considered include expansion of the current study to multiple institutions, as well as revisiting our ASA misclassification rate after education. Second, the actual model used to calculate NSQIP predicted mortality is proprietary, so we are unable to report exact statistical change in predicted mortality. In the future, NSQIP plans to be increasingly robust. For example, NSQIP models will improve with specific variables collected for complex hepatobiliary cases.

Conclusion

ASA misclassification significantly impacts observed/expected mortality ratio, and thus, how a particular institution’s safety is viewed. In our review, misclassification, particularly in emergency cases, underestimated the number of predicted deaths by up to 9%. With accurate ASA classification, observed mortality would not have exceeded expected mortality in our institution. Continued education regarding the impact of ASA scoring is essential to ensure accurate O/E mortality data is being used to assess surgical quality at participating NSQIP institutions. Our institution has since instituted a policy that the ASA must be announced during the pre-procedure time out and agreed upon or discussed prior to incision.