Benchmarking the American Society of Breast Surgeon Member Performance for More Than a Million Quality Measure-Patient Encounters

Background Nine breast cancer quality measures (QM) were selected by the American Society of Breast Surgeons (ASBrS) for the Centers for Medicare and Medicaid Services (CMS) Quality Payment Programs (QPP) and other performance improvement programs. We report member performance. Study Design Surgeons entered QM data into an electronic registry. For each QM, aggregate “performance met” (PM) was reported (median, range and percentiles) and benchmarks (target goals) were calculated by CMS methodology, specifically, the Achievable Benchmark of Care™ (ABC) method. Results A total of 1,286,011 QM encounters were captured from 2011–2015. For 7 QM, first and last PM rates were as follows: (1) needle biopsy (95.8, 98.5%), (2) specimen imaging (97.9, 98.8%), (3) specimen orientation (98.5, 98.3%), (4) sentinel node use (95.1, 93.4%), (5) antibiotic selection (98.0, 99.4%), (6) antibiotic duration (99.0, 99.8%), and (7) no surgical site infection (98.8, 98.9%); all p values < 0.001 for trends. Variability and reasons for noncompliance by surgeon for each QM were identified. The CMS-calculated target goals (ABC™ benchmarks) for PM for 6 QM were 100%, suggesting that not meeting performance is a “never should occur” event. Conclusions Surgeons self-reported a large number of specialty-specific patient-measure encounters into a registry for self-assessment and participation in QPP. Despite high levels of performance demonstrated initially in 2011 with minimal subsequent change, the ASBrS concluded “perfect” performance was not a realistic goal for QPP. Thus, after review of our normative performance data, the ASBrS recommended different benchmarks than CMS for each QM.

Mastery, surgeons can view their own performance and immediately compare themselves to other surgeons after they enter data. As early as 2009, nearly 700 member surgeons of the ASBrS demonstrated their commitment to QM reporting by entering data on 3 QM for each of 28,000 breast cancer cases. 13 We updated the results of the ASBrS measurement program for those QM accepted by the Centers for Medicare and Medicaid Services (CMS) for their quality payment programs (QPP) ( Table 1). 36 Our purpose was to provide transparency of member performance, investigate for variability of care, and describe how this information was used to develop quality targets (benchmarks). To our knowledge, we report the largest sample of breast surgeon-entered QM encounters assembled to date.

METHODS
De-identified QM data were obtained from the ASBrS for the years 2011-2015. Due to de-identification, the Institutional Review Board of the Gundersen Health System deemed the study was not human subject research; the need for formal IRB approval was waived.

CMS Rules and Formulas
All QM must be specified with inclusion, exclusion, and exception criteria (Table 1). 36,37 Using ''performance met'' (PM) and ''performance not met'' (PNM) for each QM, the formula for performance rate (PR) was as follows: For calculating the total number of surgeon-patientmeasure encounters captured in Mastery, we summed the total reports for each individual QM for all study years and all providers who entered data. Statistical Analysis Software, version 9.3 (SAS Institute Inc., Cary, NC) was used to report performance.
Benchmarks for performance for each QM were calculated by the Achievable Benchmark of Care TM (ABC) methodology recommended by CMS. 38,39 ABC benchmarks were reviewed by the ASBrS Board of Directors in person on January 22, 2016. By the ABC method, calculated benchmarks for six QM were 100% performance met. Thus, for these measures, performance not met became a defacto ''never-should occur event.'' As a result, the Patient Safety and Quality (PSQ) and the executive committees recommended different benchmarks to be based on our member normative performance data and society expert opinion. This methodology of setting a target goal for passing a test has been termed a modified Angoff approach by educators and is similar to the process used by the European Society of Breast Cancer Specialists (EUSOMA). 28,[40][41][42][43] To assess annual trends in performance, the Cochrane Armitage test was used.

Society Actions
The ASBrS performed an annual review of participating member performance for the QM captured in Mastery. The results were presented to their Board of Directors by the PSQ Committee. Initiatives to address quality concerns were then discussed or planned.

Encounters Captured
A total of 1,286,011 unique provider-patient-measure encounters were captured in Mastery during 2011-2015 for 9 QCDR QM. 44 Encounters varied by QM from 275,619 for the specimen orientation QM to 2680 for a recently introduced hereditary risk QM ( Table 2). The number of encounters differed by QM due to its eligibility requirements and the time point when it was first available for reporting. The dropout rate of surgeons who did not enter any encounters for the last reporting year (2015) but who had entered data in prior years was 43% (354/832).

Performance
Performance and benchmarks are shown in Table 2. Performance variability and trends are shown in Fig. 1 and Table 3. The initial and last performance met rates for seven QM from 2011-2015 were as follows: needle biopsy (NB) (95. 8 Table 4.

Background
In 2008, the ASBrS launched its Mastery program for breast surgeons to document the quality of their clinical performance. 13 After modified Delphi ranking, 9 of 144 breast surgical QM were chosen for ASBrS member selfassessment, benchmarking and CMS QPP. 44 Program developers and ranking participants had diversity of practice location and type to include nonspecialty breast surgeons.

Participation
By 2017, spurred by landmark legislation and the need to improve quality, nearly 70 organizations developed patient registries for clinicians to report more than 300 QM to CMS. 36,[45][46][47][48] Our registry which started much earlier has already successfully captured over one million unique patient-measure encounters and provided real-time benchmarking.

Performance
There was a high rate of performance for eight of our nine measures. For these eight measures, compliance was met in more than 94% of patient encounters. Notable examples include the high rate of preoperative diagnosis of breast cancer made by a needle biopsy (97.5%) and the low rates of surgical site infection and unplanned reoperation after mastectomy: both less than 2%. This level of performance exceeded most historical reports. 18,[49][50][51] The QM with the lowest aggregate performance was ''documentation of surgeon hereditary assessment of a newly diagnosed breast cancer patient'' at 86%. Overall performance for the other eight QM was excellent. However, we recognize that disparities of care may still be present. During the last 6 years of measurement, there were statistically significant changes in performance for all measures. Despite both the upward-and downward-trending changes, the absolute differences by year were small, all less than 3%, which raises the question of these changes' clinical significance. Because the performance level of surgeons reporting in Mastery is so high, it is possible these surgeons may be a self-selected group of high-performing surgeons. Supporting this concept, surgeons voluntarily reporting in a cardiac surgery registry, compared with nonparticipants, demonstrated better performance. 52 Our findings of such high performance in the initial study years, followed by minimal annual change, is similar to a recent report from European breast centers. 53 When this scenario occurs, there is concern that these QM may have ''topped out,'' resulting in less opportunity for future improvement. However, because the level of performance for nonparticipants in our program is unknown, we have not yet retired our QM; rather by continuing to support them, we are endorsing their importance inside and outside our society membership. Although aggregate performance rates were high, variability of performance existed, best demonstrated by histograms (Fig. 1). Whenever variability coexists with evidence that high performance is achievable, there is opportunity to improve overall care. 54 When Performance is Not Met, What Can We Learn?
The most common reasons for not meeting performance for each measure are documented in Table 3. Even with high overall performance, there is value to identifying causes of measure noncompliance. Understanding causation affords opportunity to improve. For example, one reason for omission of a needle biopsy for diagnosing cancer was ''needle biopsy not available in my community,'' which represents a system and resource issue, rather than a surgeon-specific issue. Supporting solutions, the ASBrS has provided education and certification for both ultrasound-guided and stereotactic core needle biopsy. 55 In another example, the second most common reason that patients underwent an unplanned reoperation after  performance. X-axis is individual surgeon de-identified ID numbers; Y-axis is performance rate from 50 to 100% ''performance met'' mastectomy was for a positive margin. Potentially, surgeons learning they are comparative outliers for margin involvement may reevaluate their care processes to better assess the cancer's proximity to the mastectomy margins preoperatively. If performance is not met for a QM due to a justifiable ''nonquality'' reason, then CMS defines this encounter as an ''exception.'' In such cases, the encounter did not penalize the surgeon, because it was not included in their performance rate. An exception to not meeting performance for achieving a cancer diagnosis by a needle biopsy occurred in 8264 patients undergoing prophylactic mastectomy and in 1814 patients having an imaging abnormality that was ''too close'' to an implant or the chest wall to permit safe needle biopsy. This granular level of information potentially aids improvement strategies. For example, in high-risk patients undergoing risk-reducing mastectomy, surgeons ought to pursue guideline concordant preoperative imaging to identify nonpalpable cancers, thereby improving both the needle-biopsy rate for cancer as well as reducing the mastectomy reoperation rate by excising sentinel nodes during the initial mastectomy in patients later found to have invasive cancer.
Capturing exceptions also allowed for accurate attribution assignments. For example, in our registry, a surgeon can attribute a reoperation after mastectomy to themselves, such as for axillary bleeding, or to the plastic surgeon for flap donor site bleeding.

Benchmarking
Benchmarking (profiling) means that participants can compare their performance to others and is a method for quality improvement. 23,31,39,53,56,57 Benchmarking programs differ. Navathe et al. recently summarized eight different design factors. 56 Using this categorization, our program is identity-blind, reports textually (not graphically), encourages high-value care, discourages low-value care, compares an individual to a group, contains measures with both higher and lower levels of evidence supporting them, has a national scope, and to our knowledge has not resulted in any unintended adverse outcome.
The term benchmark means a point of reference. A benchmark may simply be an observation of results of contemporary care, perhaps when first described in a specific patient population. 39,58 A benchmark also can be an organizational target goal, such as a zero percent infection rate, or a data-driven reference, reached when content experts scrutinize observed ranges of performance and subsequently endorse a specific percentile. [40][41][42][43] In 2008, 24 breast cancer experts attended a workshop in Europe and established benchmarks for 17 quality The hereditary assessment and the unplanned reoperation after mastectomy QM not included in trending analysis, because they are new measures Exceptions mean the surgeon is not penalized in their performance rate for not meeting performance because the reason for PNM is justifiably not related to quality, as determined by the American Society of Breast Surgeons indicators for breast centers, calling them minimum standards and quality targets. 28,53 The establishment of a quality target is a known method for improving quality beyond the effect of peer comparison. 35 Recognizing this concept, CMS requires that QCDR stewards determine ABC benchmarks for each of their QM. 38 Conceptually, the ASBrS Board of Directors agreed that benchmarks can be catalysts for improvement. After application of the ABC formula, the CMS benchmarks for six of our QM were 100% ''performance met.'' After review, the ASBrS Board concluded that achieving perfection in every patient encounter was desirable but should not be considered the ''standard of care''; nor should ''performance not met'' be considered a ''never'' event. As a result, the ASBrS Quality Committee and Board reviewed the member performance presented here, as well as relevant literature, then endorsed different benchmarks, that reflected high-quality and clinically achievable care (Table 2).

Was our Quality Program the Driver of Observed Improvements in Performance?
For the first three QM that measured needle biopsy, specimen imaging, and specimen orientation rates in our program before 2011, there was marked improvement compared with 2011-2015. 13 Overall, there was significant improvement for seven of our nine QM from 2011 to 2015. Whether this improvement was directly related to our measurement and benchmarking, and the natural consequence of measurement driving improvement, cannot be conclusively determined given multiple competing reasons that could explain improvement. These potential confounders include some changes in QM specifications over time, as well as our own educational programs and scholarly publications within and outside our Society. 53

Program and Study Strengths and Limitations
The strengths and limitations of the ASBrS Mastery patient registry have been described elsewhere. 44 Strengths include large sample sizes, immediate peer comparison, and appropriate attribution assignments. 44 In addition, our registry is flexible in terms of its ability to capture additional data fields after appropriate vetting by the society and in its ability to output data across a number of domains. While it was initially developed for quality measurement, it also has been used for clinical outcomes research. 13,[59][60][61][62] Limitations are recognized. 44 A selection bias is possible because the surgeons who self-select to participate may be ''above average.'' They may share certain characteristics, such as a focus on quality and safety, better resources or a different case-mix compared to nonparticipating surgeons. If so, our results may not be reproducible in other settings. Due to this concern, the ASBrS Board agreed to offer participation to non-ASBrS members for pilot studies. Other limitations include an unknown rate of nonconsecutive case entry and an unknown rate of surgeon dropout due to their perception of poor performance. In addition, most of our QM are process rather than outcome measures, and we are not providing risk-adjusted comparisons. As a result, investigations are underway to identify the interactions between patient, surgeon, and facility characteristics that affect our measured outcomes. Lastly, formal reliability testing of our measures and advanced analytic tools to disentangle each surgeon's intrinsic performance from their supporting institution have not been performed. 63,64

CONCLUSIONS
The ASBrS successfully constructed an electronic patient registry and then engaged breast surgeons to capture more than a million organ-specific QM encounters, providing proof of surgeons' commitment to self-assessment as well as evidence of our societies' compliance with a mission ''continually to improve the practice of breast surgery.'' 65 Functionality was provided for surgeon profiling, program data were used to establish quality targets and a service was provided to surgeon members allowing them to participate in CMS incentivized reimbursement programs. Much work remains to include more advanced analytic methods for benchmarking and to decide when to retire existing measures that may have ''topped out.'' For now, we encourage all surgeons not participating in our program to compare their personal performance to our benchmarks. In addition, we are currently searching for inequities and disparities of care by surgeon and patient characteristics.
ACKNOWLEDGEMENT The authors thank Sharon Grutman for ASBrS Patient Safety and Quality Committee support, Mena Jalali for Mastery SM Workgroup support, Margaret and Ben Schlosnagle for Mastery SM technology support, and the Mastery Workgroup members who provide oversight and practical improvements to the Mastery patient registry (Co-Chairs Linda Smith and Kathryn Wagner; members Eric Brown, Regina Hampton, Thomas Kearney, Alison Laidley, and Jason Wilson). Also, Choua Vang for assistance in manuscript preparation and the Gundersen Medical Foundation and the Norma J. Vinger Center for Breast Care for financial and statistical support.

DISCLOSURE All authors-none.
OPEN ACCESS This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.