INTRODUCTION

The increased emphasis on transparent, publicly accessible data in the USA for the last two decades has allowed patients and customers to compare hospitals and clinicians’ performances across institutions and conditions.1 Transparency of healthcare quality is crucial and quality information necessary for patients to choose between providers, insurers to procure care, and providers to improve their services.2 Several entities rate hospitals and clinicians on quality and patient safety and update their findings and assessments once or twice yearly3.

However, concerns have been raised that a high rating alone is not always associated with better clinical outcomes.4,5 The lack of transparent methodology has also created a credibility gap.6,7 Hospital ratings by one entity may not necessarily translate into a comparable rating by another, resulting in patient and stakeholder confusion.3,8,9

Studies comparing rating systems either included a hospital sample that was too small, focused mainly on highly ranked hospitals,3,8,9 or failed to compare the specific components of the scores. Thus, there is a need to perform more studies that include a larger number of hospitals, not only the top-ranked ones, and compare the performance of the ranking systems.

In this study, we aimed to analyze and compare the overall rankings and diagnosis, condition, and procedure-specific scores a hospital receives from four national rating organizations: Hospital Compare® (HC), Healthgrades® (HG), The Leapfrog Group® (Leapfrog), and US News and World Report® (USN).

METHODS

Study Design

We performed an observational study and gathered data from four sources: HC, HG, Leapfrog, and USN. We chose those organizations because they collect data on hospitals nationwide, and their findings and ratings are available without specific subscriptions.

The Cooper University Healthcare Institutional Review Board reviewed this study and deemed it exempt from institutional review. We followed the STROBE reporting guidelines for observational studies.10

Data Sources/Databases Searched

HC is a public reporting tool from the Centers for Medicare and Medicaid Services (CMS).11 It gathers data from hospitals participating in the Medicare program.

HG is a private US company that evaluates hospitals based on risk-adjusted mortality and in-hospital complications.12 It converts data from publicly available sources into a number of stars (maximum 5) for different metrics.

Leapfrog 13 is a nonprofit organization that has conducted a national hospital survey twice yearly since 2001. Hospital administrators fill out the surveys, but Leapfrog verifies the accuracy of the information. Leapfrog assesses hospitals in many domains and assigns an overall safety grade from A (highest) to F (Lowest).

USN is a digital media company that publishes rankings on various domains such as education, cars, and health.14 It evaluates hospitals on multiple metrics and ranks them regionally and nationally.

Search Strategy

From Feb 1, 2023, to Oct 3, 2023, we queried quality scores and patient safety data for all the acute care hospitals in the USA. We obtained the list of all the acute care hospitals from the American Hospital Directory, which uses publicly available sources.15

We excluded specialty, pediatric, and critical access hospitals and those that did not have at least one data entry in each searched database. We used the physical address to validate hospitals with different names in different databases. In the final sample, we included hospitals that declined to respond to the Leapfrog hospital survey® but still received a safety grade. Four authors (AE, JR, TN, and SI) conducted the search, and for each hospital, one author reviewed all four databases simultaneously.

Variables Recorded

From the Leapfrog database, we recorded the overall hospital safety grade, and in the category “problems with surgery,” the variable “dangerous object left in patient's body’’, the object being a sponge or a tool. We dichotomized these surgical events and analyzed a simplified binary metric where we reclassified any score greater than 0 as Yes or occurred and a score of 0 as No or absent. From HC, we recorded the overall number of stars. From USN and HG, we recorded the score for 30-day mortality for the following conditions: heart attack, aortic or valve surgery, bypass surgery, heart failure, colon or colon cancer surgery, stroke, COPD, and pneumonia. We also recorded each institution's overall score for the following procedures: hip fracture treatment, hip replacement surgery, and knee replacement surgery. HG also reported on surgical objects left in a patient’s body. Finally, from USN, we recorded whether an institution was regionally ranked and was high-performing in a specialty, condition, or procedure.

Statistical Analysis

We presented categorical variables as numbers (percentages) and continuous variables as mean (± standard deviations). We converted the 5 Leapfrog grades and 5 USN/HC/HG star system to a 1 to 5 scale, 5 being the best rating. We defined discordance as any difference between the scores obtained in different databases and severe discordance as a larger than one scale difference (e.g., getting an A on Leapfrog and a 3 star on HC). We used contingency tables to evaluate the amount of discordance between different databases. We calculated the Spearman correlation factor for the correlation strength between variables. We used the SPSS IBM 28.0 software (Chicago, IL, USA) to perform all analyses.

RESULTS

Hospitals Characteristics

There were 3,871 hospitals in the American Hospital Directory database. Of those, 2,384 met our study's inclusion criteria. Hospitals were mainly in the South (940, 39.4%), followed by the Midwest (567,23.8%), West (484, 20.3%), and Northeast (393,16.5%). Appendix Table 5 lists the number of hospitals by state, and Appendix 2 lists the states attributed to each region. The average number of beds was 197 (± 237), the average number of yearly discharges was 7,754 (± 10,835), and the average number of patient days was 37,936 (± 64,212).

HC and Leapfrog’s Overall Ratings

The Leapfrog Hospital Safety Grade distribution showed 688 (29%) with A, 652 (27.3%) with B, 885 (37.1%) with C, 153 (6.4%) with D, and 6 (0.3%) with F. As for the HC stars, 333 hospitals (14%) had five, 676 (28.4%) four, 695 (29.2%) three, 502 (21.4%) two, and 171 (7.2%) one. Table 1 shows the concordance of the HC star rating and Leapfrog Hospital Safety Grade. The ratings were discordant 70% of the time (difference of one or more) and severely discordant (difference of 2 or more) 25.1% of the time (598 hospitals). There was a very weak correlation between the Leapfrog and HC ratings, with a Spearman's correlation coefficient of 0.37[0.33–0.4], P < 0.001.

Table 1 Concordance of HC and Leapfrog Ratings (Number of Hospitals in Each Category)

USN-Ranked Hospitals and Leapfrog and HC Ratings

USN ranked 469 hospitals (19.7%) regionally or nationally. The Leapfrog rating of the USN-ranked hospitals showed 195 (41.6%) with an A, 120 (25.6%) with a B, 137 (29.2%) with a C, 17 (3.6%) with a D, and zero with an F. The HC rating of the same hospitals showed 110 (23.5%) with five stars, 148 (31.6%) with four stars, 129 (17.5%) with three stars, 66 (14%) with two stars, and 16 (3.4%) with one star. Only 77 hospitals (3.2%) got a USN ranking, a Leapfrog grade A, and five stars on HC. Within the USN-ranked hospital group, discordance between HC and Leapfrog was 62%, and severe discordance was 19.8%. Two hundred ten hospitals got a USN ranking, A or B on Leapfrog, and five or four stars on HC (Table 1).

USN High-Performing Hospitals and Leapfrog and HC Ratings (Table 2)

Table 2 Concordance of USN High-Performing Hospitals with Leapfrog and HC Ratings (Number of Hospitals in Each Category)

USN has two additional distinction categories: high-performing specialties and high-performing conditions or procedures. Two hundred seventy-nine hospitals were high-performing in one or more specialties, with an average of two specialties per hospital (± 1.7), and1,702 hospitals were high-performing in one or more conditions or procedures, with an average of 5.16 (± 4.2) per hospital.

We divided hospitals into three groups based on the number of high-performing specialties: 2,105 (88.3%) had none, 105 (4.4%) had one, and 173 (7.3%) had two or more. Similarly, we divided the hospitals based on the number of high-performing conditions or procedures: 682 (28.6%) had none, 355 (14.9%) had one, 411 (17.2%) had two or three, 317 (13.3%) had four or five, and 619 (26%) had more than five.

Hospitals with high-performing specialties had a very weak correlation with Leapfrog (0.13[0.8–0.17] and HC 0.19[0.15–0.23]. Similarly, there was a weak correlation for hospitals with high-performing conditions or procedures with Leapfrog 0.21[0.17–0.25] and HC 0.13[0.8–0.17] (all p < 0.001). For example, as shown in Table 2, which displays the hospital number by rating category, many Leapfrog Grade C or HC 3-star hospitals were deemed high-performing for specialties or procedures. Conversely, many Grade A Leapfrog or 5-star HC hospitals did not have a single USN high-performing specialty or procedure.

Surgical Events

Leapfrog and HG recorded events of surgical objects left in patients' bodies in 198 (8.3%) and 444 (18.6%), respectively, but only 164 hospitals (6.9%) had events in both Leapfrog and HG.

30-Day Survival Rates and Orthopedic Procedures Complications

USN and HG recorded 30-day survival rates for many conditions. Table 3 shows the degree of discordance between databases. For 30-day survival rates, discordance ranged from 28.3% to 52.5%, and severe discordance from 20.9% to 40.8%. For the ratings of orthopedic procedures, discordance was higher and ranged from 48% (hip replacement) to 61.2% (hip fracture), and severe discordance ranged from 35.6% (hip replacement) to 49.3% (hip fracture).

Table 3 Discordance Between Survival Rates and Complication Rating in USN and HG

DISCUSSION

Our study examined the hospital rating results of four publicly reporting entities that seek to determine how hospitals perform. We found discordance —often substantial— that may confuse patients and consumers using the data to make informed healthcare decisions.

To explain the discordance, we shall examine the differences between the four rating organizations (Table 4).11,12,13,15 First, the organizations have distinct interests and focus on different aspects of quality and safety. Their for-profit or nonprofit status might also impact some of the results. Second, each rating system records different data and chooses its unique methodology. Leapfrog asks each hospital to complete a survey twice yearly and states it validates findings using specific methods. USN has a unique ranking of hospitals’ reputations as assessed by clinicians in the field. As for the measure types used in the ranking, all four organizations include outcome measures, but the specifics vary. For example, HG only has risk-adjusted mortality for conditions and procedural complications. Structural data is unique to Leapfrog (use of CPOE, barcoding, physician staffing) and USN (nursing staffing), whereas HC includes efficiency and timeliness of care.

Table 4 Characteristics of Hospital Rating Organizations

When comparing USN, HG, and Leapfrog, we found different results for specific metrics, such as surgical events or 30-day mortality. The self-reporting nature of Leapfrog may include inaccuracies. However, for the rating process to be transparent, the risk-adjusted methodologies and studied timelines should allow full replication and estimation of results and grades.

The findings of our study are consistent with previous work. Halasyamani compared the performance of USN and HC in acute myocardial infarction, congestive heart failure, and community-acquired pneumonia and found significant discordance between the ranking systems.9 Austin et al. analyzed four national rating systems (USN, Leapfrog, HG, and Consumer reports®) and found that only 10% of 844 hospitals rated by one system as high-performing were also rated similarly by another.3 The ranking analysis from five national rating systems (Leapfrog ®, Vizient®, Truven®, Hospital Compare ®, and US News®) found only a weak correlation between the ratings. One shortcoming of that study is that it only analyzed a small group of hospitals ranked at the top. Furthermore, the Vizient ® data is only available to subscribers.8

Unlike previous studies, ours included all adult hospitals, not only the highly ranked ones, making it, to our knowledge, the largest to date. We did not focus solely on the overall rating. Our analysis of specific variables, such as survival and complications, was unique.

Our results and the findings of previous work show that the time has come to reflect critically on hospital rankings and their meaning for the public. Patients might struggle to grasp the reasons behind the rating differences, such as the type of data reviewed or the weight placed on each metric. As seen in our results, there was severe discordance in the classifications of hospitals more than a quarter of the time. Research on how patients utilize these publicly reported databases shows that less than half of them considered online reviews important when choosing a physician—with even lower use of online reviews in patients under 65 years.16 This conflicting data that is publicly available online might cast an overall suspicion of the entire process. The difficulty of understanding data and the vagueness and complexity of the metrics are some of the issues that limit the usefulness of hospital quality ratings.17,18 Harmonizing the ratings to make them understandable and meaningful for the patient or hospital is needed.18

One concern is that ratings often rely on 12 to 24-month-old data. From a practical point of view, logistics and data calculation are significant barriers. But how valuable would it be for a patient to know that a hospital had an excellent safety record two years ago if its performance has since worsened?

The Agency for Healthcare Research and Quality (AHRQ) defines quality of care as safe, timely, equitable, effective, efficient, and patient-centered.19 However, only HC includes efficiency and effectiveness in reporting cost and throughput. Many of the measures included in the rankings are likely not meaningful to patients.18 Patients might not care whether a hospital performs too many CT scans if outcomes such as mortality or infections are desirable.20 Hence, ranking organizations have an opportunity for a more patient-centered distribution and weighing of their measures.21

Our study highlights the need for more research to understand the consumers’ preferences when utilizing hospital ratings for their healthcare choices. A single unified reporting system might seem ideal, but it is unrealistic, given the different rating organizations. Furthermore, consumers might be used to, if not enjoy, browsing multiple websites. Some might also refer to websites like Facebook or Yelp, where other patients might have left hospital reviews. As a result, consumers might need assistance navigating the complex, widespread, and often discordant data. Instead of a single rating system, a single website that pulls the different ratings into one location and attempts to provide a unifying interpretation of the chaos could prove much more helpful.

We have thus far discussed the impact of discrepant ratings on patients and customers. We must also highlight the impact of ratings on hospitals. Wallenburg et al.22 studied three teaching Dutch hospitals and showed that rankings, with their high volatility, are criticized for faulty design and inability to improve performance. Yet, hospital managers and professionals meet them with ambivalence due to their concern about reputation and competition. Rankings pushed hospitals to invest in forms, introducing different information technologies, training and disciplining clinical staff to collect and register indicator information, and standardizing care processes to enable data collection. The hospitals' criticisms are reminiscent of recent news of colleges ending their participation in USN rankings.23

Our study has several strengths, including the large number of hospitals analyzed and the variables included. It also has several limitations. First, we only limited our search to adult general hospitals and, given the limitations in publicly available data, could not review every hospital in the USA. Second, we had to include hospitals with Leapfrog ratings that did not share their findings. Third, we based our conclusions on the data we collected during the research. We may have gotten different results if we performed the search other times. Our search method, however, mimicked what a patient or consumer would do, i.e., look at various ratings of one hospital simultaneously. The rating organizations’ methodologies often change, and given the lack of transparency, we cannot predict how the change will affect the discrepancies. Therefore, we can make a case for frequent analysis of their results to keep the consumer informed.

CONCLUSION

The ratings of four organizations were significantly discordant on quality metrics, overall safety rankings, 30-day survival, and orthopedic procedure complications scores. Differences in methodology, time periods, and analyzed patient populations can explain the discrepancies. Still, the discordances that hospitals have criticized may create significant confusion for patients and consumers. Future research should understand the consumers’ needs and attempt to help navigate the discordant data to prevent confusion.