Gaps in the quality of healthcare exist in the United States.1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21 As a consequence, measures of quality have been developed and initiatives launched to provide peer comparisons as a method for quality improvement.4,9,10,14,15,16,17,20,22,23,24,25,26,27,28,29,30,31,32,33,34,35 Building on these efforts, payers of healthcare introduced public reporting and “pay for performance” programs.36

Recognizing the need to search for gaps in care, the American Society of Breast Surgeons (ASBrS) built a patient registry called Mastery of Breast SurgerySM (Mastery) and developed quality measures (QM) to audit.24 In Mastery, surgeons can view their own performance and immediately compare themselves to other surgeons after they enter data. As early as 2009, nearly 700 member surgeons of the ASBrS demonstrated their commitment to QM reporting by entering data on 3 QM for each of 28,000 breast cancer cases.13 We updated the results of the ASBrS measurement program for those QM accepted by the Centers for Medicare and Medicaid Services (CMS) for their quality payment programs (QPP) (Table 1).36 Our purpose was to provide transparency of member performance, investigate for variability of care, and describe how this information was used to develop quality targets (benchmarks). To our knowledge, we report the largest sample of breast surgeon-entered QM encounters assembled to date.

Table 1 American Society of Breast Surgeons quality measures (QM)44

Methods

De-identified QM data were obtained from the ASBrS for the years 2011–2015. Due to de-identification, the Institutional Review Board of the Gundersen Health System deemed the study was not human subject research; the need for formal IRB approval was waived.

CMS Rules and Formulas

All QM must be specified with inclusion, exclusion, and exception criteria (Table 1).36,37

Using “performance met” (PM) and “performance not met” (PNM) for each QM, the formula for performance rate (PR) was as follows: PR = [PM]/[PM + PNM]. Patients with exceptions are included in the PR only if there was PM. Excluded patients are never included in the PR. For example, patients undergoing lumpectomy are excluded from the mastectomy reoperation QM.

For calculating the total number of surgeon–patient-measure encounters captured in Mastery, we summed the total reports for each individual QM for all study years and all providers who entered data. Statistical Analysis Software, version 9.3 (SAS Institute Inc., Cary, NC) was used to report performance.

Benchmarks for performance for each QM were calculated by the Achievable Benchmark of Care™ (ABC) methodology recommended by CMS.38,39 ABC benchmarks were reviewed by the ASBrS Board of Directors in person on January 22, 2016. By the ABC method, calculated benchmarks for six QM were 100% performance met. Thus, for these measures, performance not met became a defacto “never-should occur event.” As a result, the Patient Safety and Quality (PSQ) and the executive committees recommended different benchmarks to be based on our member normative performance data and society expert opinion. This methodology of setting a target goal for passing a test has been termed a modified Angoff approach by educators and is similar to the process used by the European Society of Breast Cancer Specialists (EUSOMA).28,40,41,42,43 To assess annual trends in performance, the Cochrane Armitage test was used.

Society Actions

The ASBrS performed an annual review of participating member performance for the QM captured in Mastery. The results were presented to their Board of Directors by the PSQ Committee. Initiatives to address quality concerns were then discussed or planned.

Results

Encounters Captured

A total of 1,286,011 unique provider-patient-measure encounters were captured in Mastery during 2011–2015 for 9 QCDR QM.44 Encounters varied by QM from 275,619 for the specimen orientation QM to 2680 for a recently introduced hereditary risk QM (Table 2). The number of encounters differed by QM due to its eligibility requirements and the time point when it was first available for reporting. The dropout rate of surgeons who did not enter any encounters for the last reporting year (2015) but who had entered data in prior years was 43% (354/832).

Table 2 Quality measure “performance met” and benchmarks 2011–2015

Performance

Performance and benchmarks are shown in Table 2. Performance variability and trends are shown in Fig. 1 and Table 3. The initial and last performance met rates for seven QM from 2011–2015 were as follows: needle biopsy (NB) (95.8, 98.5%), specimen imaging (SI) (97.9, 98.8%), antibiotic selection (AS) (98.0, 99.4%), antibiotic duration (AD) (99.0, 99.8%), no surgical site infection (NSSI) (98.8, 98.9%), specimen orientation (SO) (98.5, 98.3%), and sentinel node use (SN) (95.1, 93.4%); all p values < 0.001, indicating significant improvement in the first five QM and worsening in the last two. The performance of three QM available before 2011, reported by Clifford et al., compared with 2015, demonstrated improvement as follows: needle biopsy (73–98.5%), specimen orientation (84–98.3%), and specimen imaging (47–98.8%); all p values < 0.001.13

Fig. 1
figure 1

Histograms of individual surgeons and their performance. X-axis is individual surgeon de-identified ID numbers; Y-axis is performance rate from 50 to 100% “performance met”

Table 3 Annual trends of performance

The most common reasons for performance not met (PNM) by QM were “patient refusal” for NB (0.6%, 583/105,541), “fragmented tissue” for SO (0.6%, 599/105,186), “imaging not available” for SI (0.04%, 41/95,534), “attempted, not successful” for SN (0.6%, 627/99,172), “no reason given” for AS (0.4%, 419/97,206), “no reason given” for AD (0.1%, 136/96,583), “infection” for NSSI (1.4%, 1987/141,963), and “bleeding” for mastectomy reoperation (0.2%, 152/73,886). Other reasons for PNM and exceptions for each QM are in Table 4.

Table 4 Quality measurement “exceptions” and “performance not met (PNM)”

Benchmarks

With the CMS ABC formula, the benchmarks were 100% performance met for every QM except for the hereditary risk measure, which was 98%. In contrast, the ASBrS-recommended benchmarks were as follows: needle biopsy (90%), specimen orientation (95%), specimen imaging (95%), sentinel node use (90%), mastectomy reoperation rate (< 10%), hereditary risk assessment (90%), and surgical infection (< 6%; Table 2). Benchmarks for the two antibiotic QM were not established, because they have been discontinued.

Discussion

Background

In 2008, the ASBrS launched its Mastery program for breast surgeons to document the quality of their clinical performance.13 After modified Delphi ranking, 9 of 144 breast surgical QM were chosen for ASBrS member self-assessment, benchmarking and CMS QPP.44 Program developers and ranking participants had diversity of practice location and type to include nonspecialty breast surgeons.

Participation

By 2017, spurred by landmark legislation and the need to improve quality, nearly 70 organizations developed patient registries for clinicians to report more than 300 QM to CMS.36,45,46,47,48 Our registry which started much earlier has already successfully captured over one million unique patient-measure encounters and provided real-time benchmarking.

Performance

There was a high rate of performance for eight of our nine measures. For these eight measures, compliance was met in more than 94% of patient encounters. Notable examples include the high rate of preoperative diagnosis of breast cancer made by a needle biopsy (97.5%) and the low rates of surgical site infection and unplanned reoperation after mastectomy: both less than 2%. This level of performance exceeded most historical reports.18,49,50,51 The QM with the lowest aggregate performance was “documentation of surgeon hereditary assessment of a newly diagnosed breast cancer patient” at 86%. Overall performance for the other eight QM was excellent. However, we recognize that disparities of care may still be present. During the last 6 years of measurement, there were statistically significant changes in performance for all measures. Despite both the upward- and downward- trending changes, the absolute differences by year were small, all less than 3%, which raises the question of these changes’ clinical significance. Because the performance level of surgeons reporting in Mastery is so high, it is possible these surgeons may be a self-selected group of high-performing surgeons. Supporting this concept, surgeons voluntarily reporting in a cardiac surgery registry, compared with nonparticipants, demonstrated better performance.52 Our findings of such high performance in the initial study years, followed by minimal annual change, is similar to a recent report from European breast centers.53 When this scenario occurs, there is concern that these QM may have “topped out,” resulting in less opportunity for future improvement. However, because the level of performance for nonparticipants in our program is unknown, we have not yet retired our QM; rather by continuing to support them, we are endorsing their importance inside and outside our society membership.

Although aggregate performance rates were high, variability of performance existed, best demonstrated by histograms (Fig. 1). Whenever variability coexists with evidence that high performance is achievable, there is opportunity to improve overall care.54

When Performance is Not Met, What Can We Learn?

The most common reasons for not meeting performance for each measure are documented in Table 3. Even with high overall performance, there is value to identifying causes of measure noncompliance. Understanding causation affords opportunity to improve. For example, one reason for omission of a needle biopsy for diagnosing cancer was “needle biopsy not available in my community,” which represents a system and resource issue, rather than a surgeon-specific issue. Supporting solutions, the ASBrS has provided education and certification for both ultrasound-guided and stereotactic core needle biopsy.55 In another example, the second most common reason that patients underwent an unplanned reoperation after mastectomy was for a positive margin. Potentially, surgeons learning they are comparative outliers for margin involvement may reevaluate their care processes to better assess the cancer’s proximity to the mastectomy margins preoperatively.

If performance is not met for a QM due to a justifiable “nonquality” reason, then CMS defines this encounter as an “exception.” In such cases, the encounter did not penalize the surgeon, because it was not included in their performance rate. An exception to not meeting performance for achieving a cancer diagnosis by a needle biopsy occurred in 8264 patients undergoing prophylactic mastectomy and in 1814 patients having an imaging abnormality that was “too close” to an implant or the chest wall to permit safe needle biopsy. This granular level of information potentially aids improvement strategies. For example, in high-risk patients undergoing risk-reducing mastectomy, surgeons ought to pursue guideline concordant preoperative imaging to identify nonpalpable cancers, thereby improving both the needle-biopsy rate for cancer as well as reducing the mastectomy reoperation rate by excising sentinel nodes during the initial mastectomy in patients later found to have invasive cancer.

Capturing exceptions also allowed for accurate attribution assignments. For example, in our registry, a surgeon can attribute a reoperation after mastectomy to themselves, such as for axillary bleeding, or to the plastic surgeon for flap donor site bleeding.

Benchmarking

Benchmarking (profiling) means that participants can compare their performance to others and is a method for quality improvement.23,31,39,53,56,57 Benchmarking programs differ. Navathe et al. recently summarized eight different design factors.56 Using this categorization, our program is identity-blind, reports textually (not graphically), encourages high-value care, discourages low-value care, compares an individual to a group, contains measures with both higher and lower levels of evidence supporting them, has a national scope, and to our knowledge has not resulted in any unintended adverse outcome.

The term benchmark means a point of reference. A benchmark may simply be an observation of results of contemporary care, perhaps when first described in a specific patient population.39,58 A benchmark also can be an organizational target goal, such as a zero percent infection rate, or a data-driven reference, reached when content experts scrutinize observed ranges of performance and subsequently endorse a specific percentile.40,41,42,43 In 2008, 24 breast cancer experts attended a workshop in Europe and established benchmarks for 17 quality indicators for breast centers, calling them minimum standards and quality targets.28,53 The establishment of a quality target is a known method for improving quality beyond the effect of peer comparison.35 Recognizing this concept, CMS requires that QCDR stewards determine ABC benchmarks for each of their QM.38 Conceptually, the ASBrS Board of Directors agreed that benchmarks can be catalysts for improvement. After application of the ABC formula, the CMS benchmarks for six of our QM were 100% “performance met.” After review, the ASBrS Board concluded that achieving perfection in every patient encounter was desirable but should not be considered the “standard of care”; nor should “performance not met” be considered a “never” event. As a result, the ASBrS Quality Committee and Board reviewed the member performance presented here, as well as relevant literature, then endorsed different benchmarks, that reflected high-quality and clinically achievable care (Table 2).

Was our Quality Program the Driver of Observed Improvements in Performance?

For the first three QM that measured needle biopsy, specimen imaging, and specimen orientation rates in our program before 2011, there was marked improvement compared with 2011–2015.13 Overall, there was significant improvement for seven of our nine QM from 2011 to 2015. Whether this improvement was directly related to our measurement and benchmarking, and the natural consequence of measurement driving improvement, cannot be conclusively determined given multiple competing reasons that could explain improvement. These potential confounders include some changes in QM specifications over time, as well as our own educational programs and scholarly publications within and outside our Society.53

Program and Study Strengths and Limitations

The strengths and limitations of the ASBrS Mastery patient registry have been described elsewhere.44 Strengths include large sample sizes, immediate peer comparison, and appropriate attribution assignments.44 In addition, our registry is flexible in terms of its ability to capture additional data fields after appropriate vetting by the society and in its ability to output data across a number of domains. While it was initially developed for quality measurement, it also has been used for clinical outcomes research.13,59,60,61,62

Limitations are recognized.44 A selection bias is possible because the surgeons who self-select to participate may be “above average.” They may share certain characteristics, such as a focus on quality and safety, better resources or a different case-mix compared to nonparticipating surgeons. If so, our results may not be reproducible in other settings. Due to this concern, the ASBrS Board agreed to offer participation to non-ASBrS members for pilot studies. Other limitations include an unknown rate of nonconsecutive case entry and an unknown rate of surgeon dropout due to their perception of poor performance. In addition, most of our QM are process rather than outcome measures, and we are not providing risk-adjusted comparisons. As a result, investigations are underway to identify the interactions between patient, surgeon, and facility characteristics that affect our measured outcomes. Lastly, formal reliability testing of our measures and advanced analytic tools to disentangle each surgeon’s intrinsic performance from their supporting institution have not been performed.63,64

Conclusions

The ASBrS successfully constructed an electronic patient registry and then engaged breast surgeons to capture more than a million organ-specific QM encounters, providing proof of surgeons’ commitment to self-assessment as well as evidence of our societies’ compliance with a mission “continually to improve the practice of breast surgery.”65 Functionality was provided for surgeon profiling, program data were used to establish quality targets and a service was provided to surgeon members allowing them to participate in CMS incentivized reimbursement programs. Much work remains to include more advanced analytic methods for benchmarking and to decide when to retire existing measures that may have “topped out.” For now, we encourage all surgeons not participating in our program to compare their personal performance to our benchmarks. In addition, we are currently searching for inequities and disparities of care by surgeon and patient characteristics.