The American Society of Breast Surgeons and Quality Payment Programs: Ranking, Defining, and Benchmarking More Than 1 Million Patient Quality Measure Encounters

Background To identify and remediate gaps in the quality of surgical care, the American Society of Breast Surgeons (ASBrS) developed surgeon-specific quality measures (QMs), built a patient registry, and nominated itself to become a Center for Medicare and Medicaid Services (CMS) Qualified Clinical Data Registry (QCDR), thereby linking surgical performance to potential reimbursement and public reporting. This report provides a summary of the program development. Methods Using a modified Delphi process, more than 100 measures of care quality were ranked. In compliance with CMS rules, selected QMs were specified with inclusion, exclusion, and exception criteria, then incorporated into an electronic patient registry. After surgeons entered QM data into the registry, the ASBrS provided real-time peer performance comparisons. Results After ranking, 9 of 144 measures of quality were chosen, submitted, and subsequently accepted by CMS as a QCDR in 2014. The measures selected were diagnosis of cancer by needle biopsy, surgical-site infection, mastectomy reoperation rate, and appropriateness of specimen imaging, intraoperative specimen orientation, sentinel node use, hereditary assessment, antibiotic choice, and antibiotic duration. More than 1 million patient-measure encounters were captured from 2010 to 2015. Benchmarking functionality with peer performance comparison was successful. In 2016, the ASBrS provided public transparency on its website for the 2015 performance reported by our surgeon participants. Conclusions In an effort to improve quality of care and to participate in CMS quality payment programs, the ASBrS defined QMs, tracked compliance, provided benchmarking, and reported breast-specific QMs to the public.

Quantification of performance can identify variation and opportunities for improvement. If performance assessment is followed by performance comparison among peers (i.e., benchmarking) coupled with transparency among providers, physicians who find themselves in the lower tiers of performance can be motivated to improve, ultimately yielding better overall care at the population level, a phenomenon that recently has been reviewed and demonstrated by several programs. [20][21][22][23][24][25][26] This report aims to describe how the American Society of Breast Surgeons (ASBrS) ranked and defined measures of quality of care and subsequently provided benchmarking functionality for its members to compare their performances with each other. By separate investigations, the actual performance demonstrated by our ASBrS membership for compliance with nine breast surgeon-specific QMs are reported.
Founded in 1995, the ASBrS is a young organization. Yet, within 20 years, membership has grown to more than 3000 members from more than 50 countries. A decade ago, the Mastery of Breast Surgery Program (referred to as ''Mastery'' in this report) was created as a patient registry to collect quality measurement data for its members. 27 Past President Eric Whitacre, who actually programmed Mastery's original electronic patient registry with his son Thomas, understood that ''quality measures, in their mature form, did not merely serve as a yardstick of performance, but were a mechanism to help improve quality.'' 28,29 Armed with this understanding, the ASBrS integrated benchmarking functionality into Mastery, thus aligning the organization with the contemporary principles of optimizing cancer care quality as described by policy stakeholders. 2,19,25,30 In 2010, Mastery was accepted as a Center for Medicaid and Medicare Services (CMS) Physicians Qualified Reporting Service (PQRS) and then as a Qualified Clinical Data Registry (QCDR) in 2014, linking provider performance to government reimbursement and public reporting. 31 Surgeons who successfully participated in Mastery in 2016 will avoid the 2018 CMS ''payment adjustment'' (2% penalty), a further step toward incentivizing performance improvement in tangible ways.

Institutional Review Board
De-identified QM data were obtained with permission from the ASBrS for the years 2011-2015. The Institutional Review Board (IRB) of the Gundersen Health System deemed the study was not human subjects' research. The need for IRB approval was waived.
Choosing, Defining, and Vetting QM From 2009 to 2016, the Patient Safety and Quality Committee (PSQC) of the ASBrS solicited QM domains from its members and reviewed those of other professional organizations. [32][33][34][35][36][37][38][39] As a result, as early as 2010, a list of more than 100 domains of quality had been collected, covering all the categories of the Donabedian trilogy (structure, process, and outcomes) and the National Quality Strategy (safety, effectiveness, efficiency, population health, care communication/coordination, patient-centered experience). 40,41 By 2013, a list of 144 measures underwent three rounds of modified Delphi process ranking by eight members of the PSQC, using a RAND/UCLA Appropriateness Methodology, which replicated an American College of Surgeons effort to rank melanoma measures and was consistent with the National Quality Forum's guide to QM development 42,43 (Tables 1, 2). During the ranking, quality domains were assigned a score of 1 (not valid) to 9 (valid), with a score of 5 denoting uncertain/equivocal validity. After each round of ranking, the results were discussed within the PSQC by email and phone conferences. At this time, arguments were presented for and against a QM and its rank. A QM was deemed valid if 90% of the rankings were in the range of seven to nine.
After three rounds of ranking ending in December 2013, nine of the highest ranked measures were ''specified'' as described and required by CMS 44 (Table 3). Briefly, exclusions to QM reporting were never included in the performance numerator or denominator. Exceptions were episodes in which performance for a given QM was not met but there was a justifiable reason why that was the case. If so, then the encounter, similar to an exclusion, was not included in the surgeon's performance rate. If an encounter met performance criteria despite typically meeting exception criteria, the encounter was included in the performance rate. Per CMS rules, each QM was linked to a National Quality Strategy Aim and Domain ( Table 3). The QMs also were assigned to a Donabedian category and to one or more of the Institute for Healthcare Improvement's ''triple aims.'' 40,45 Each of our QMs underwent vetting in our electronic patient registry (Mastery) by a workgroup before submission to CMS. During this surveillance, a QM was modified, retired, or advanced to the QCDR program based on member input and ASBrS Executive Committee decisions.

Patient Encounters
To calculate the total number of provider-patient-measure encounters captured in Mastery, we summed the total reports for each individual QM for all study years and all providers who entered data.

Benchmarking
Each surgeon who entered data into Mastery was able to compare his or her up-to-date performance with the aggregate performance of all other participating surgeons (Fig. 1). The surgeons were not able to access the performance metrics of any other named surgeon or facility.

Data Validation
In compliance with CMS rules, a data validation strategy was performed annually. A blinded random selection of at least 3% of QCDR surgeon participants was conducted. After surgeons were selected for review, the ASBrS requested that they send the ASBrS electronic and/or paper records to verify that their office/hospital records supported the performance ''met'' and ''not met'' categories that they had previously reported to the ASBrS via the Mastery registry.

Hierarchical Order and CMS QCDR Choices
The median ranking scores for 144 potential QMs ranged from 2 to 9 ( Table 2). The nine QMs chosen and their ranking scores were appropriate use of preoperative needle biopsy (9.0), sentinel node surgery (9.0), specimen imaging (9.0), specimen orientation (9.0), hereditary assessment (7.0), mastectomy reoperation rate (7.0), preoperative antibiotics (7.0), antibiotic duration (7.0), and surgical-site infection (SSI) (6.0). The specifications for these QMs are presented in Table 3. The mastectomy reoperation rate and SSI are outcome measures, whereas the remainder are process of care measures.

QM Encounters Captured
A total of 1,286,011 unique provider-patient-measure encounters were captured in Mastery during 2011-2015 for the nine QCDR QMs. Performance metrics and trends for each QM are reported separately.

Data Validation
The QM reporting rate of inaccuracy by surgeons participating in the 2016 QCDR data validation study of the 2015 Mastery data files was 0.82% (27 errors in 3285 audited patient-measure encounters). Subsequent reconciliation of discordance between surgeon QM reporting and patient clinical data occurred by communication between the ASBrS and the reporting provider.

CMS Acceptance and Public Transparency
The Center for Medicare and Medicaid Services accepted the ASBrS QM submitted to them for PQRS participation in 2010-2013 and for QCDR in 2014-2016. In 2016, they discontinued the specimen orientation measure for future reporting and recommended further review of the mastectomy reoperation rate measure. Public reporting of 2015 individual surgeon QCDR data was posted in 2016 on the ASBrS website.

Security
To our knowledge, no breaches have occurred with any surgeon-user of Mastery identifying the performance of any other surgeon or the identity of any other surgeon's patients. In addition, no breaches by external sources have occurred within the site or during transmission of data to CMS.

Modified Delphi Ranking of QM
To provide relevant QM for our members, the PSQC of the ASBrS completed a hierarchal ranking of more than 100 candidate measures and narrowed the collection of QMs to fewer than a dozen using accepted methods. 42,43 Although     46 Based on our experience, we recommend its use for others wanting to prioritize longer lists of potential QM domains into shorter lists. These lists are iterative, allowing potential measures to be added anytime, such as after the publication of clinical trials or after new evidence-based guidelines are developed for better care. In addition, with the modified Delphi ranking process, decisions are made by groups, not individuals.
After Ranking, What Next?
Of the nine QMs selected for submission to CMS, only four had the highest possible ranking score. The reasons for not selecting some highly ranked domains of care included but were not limited to the following concerns. Some QMs were already being used by other organizations or were best assessed at the institutional, not the surgeon, level, such as the use of radiation after mastectomy for nodepositive patients. [32][33][34][35][36] Other highly ranked measures, such as ''adequate history,'' were not selected because they were considered standard of care.
Contralateral prophylactic mastectomy rates, a contemporary topic of much interest, was not included in our original ranking, and breast-conserving therapy (BCT) was not ranked high due to our concern that both were more a reflection of patient preferences and of regional and cultural norms than of surgeon quality. A lumpectomy reoperation QM was ranked high (7.5), but was not chosen due to disagreement within the ASBrS whether to brand    47,48 In some cases, QMs with lower scores were selected for use for specific reasons. For example, by CMS rules, two QMs for a QCDR must be ''outcome'' measures, but all our highest ranked measures were ''process of care'' measures. There was occasional overlap between our QM and those of other organizations. 21,[32][33][34][35][36][37][38][39] In these cases, we aimed to harmonize, not compete with existing measures. For example, a patient with an unplanned reoperation after mastectomy would be classified similarly in both the National Surgical Quality Improvement Program (NSQIP) and our program. In contrast to NSQIP, we classified a patient with postoperative cellulitis as having an SSI. Because excluding cellulitis as an SSI event has been estimated to reduce breast SSI rates threefold, adoption of the NSQIP definition would underestimate the SSI burden to breast patients and could limit improvement initiatives. 49 Governance Ranking and specifying QMs is arduous. Consensus is possible; unanimous agreement is rare. Therefore, a governance structure is necessary to reconcile differences of opinion. In our society, the PSQC solicits, ranks, and specifies QMs. A workgroup vets them for clarity and workability. In doing so, the workgroup may recommend changes. The ASBrS Executive Committee reconciles disputes and makes final decisions .

Reporting Volume
Our measurement program was successful, capturing more than 1 million provider-patient-measure encounters. On the other hand, our member participation rate was less than 20%. By member survey (not reported here), the most common reason for not participating was ''burden of reporting.'' Benchmarking ''Benchmarking'' is a term used most often as a synonym for peer comparison, and many programs purport to provide it. 25 In actuality, benchmarking is a method for improving quality and one of nine levers endorsed by the National Quality Strategy to upgrade performance. 21,23,30,50 Believing in this concept, the ASBrS and many other professional societies built patient registries that provided benchmarking. 21,25,[32][33][34][35] In contradistinction, the term ''benchmark'' refers to a point of reference for comparison. Thus, a performance benchmark can have many different meanings, ranging from a minimal quality threshold to a standard for superlative performance. 24,36

PROGRAM STRENGTHS
Our patient registry was designed to collect specialtyspecific QMs as an alternative to adopting existing general surgical and cross-cutting measures. Cross-cutting measures, such as those that audit medicine reconciliation or care coordination, are important but do not advance specialty-specific practice. Furthermore, breast-specific measures lessen potential bias in the comparison of providers who have variable proportions of their practice devoted to the breast. Because alimentary tract, vascular, and trauma operations tend to have higher morbidity and mortality event rates than breast operations, general surgeons performing many non-breast operations are not penalized in our program for a case mix that includes these higher-risk patients. In other words, nonspecialized general surgeons who want to demonstrate their expertise in breast surgery can do so by peer comparison with surgeons who have similar case types in our program. In addition, a condition-specific program with public transparency allows patients to make more informed choices regarding their destination for care. In 2016, individual provider reportcarding for our participating surgeons began on the ''physician-compare'' website. 51 Another strength of an organ-specific registry is that it affords an opportunity for quick Plan-Do-Study-Act (PDSA) cycles because personal and aggregate performance are updated continuously. Thus action plans can be driven by subspecialty-specific data, not limited to expert opinion or claims data. For example, a national consensus conference was convened, in part, due to an interrogation of our registry that identified wide variability of ASBrS member surgeon reoperation rates after lumpectomy. 52,53 Other program strengths are listed in Table 4.

STUDY LIMITATIONS
Although risk-adjusted peer comparisons are planned, to date, we are not providing them. In addition, only the surgeons who participate with CMS through our QCDR sign an ''attestation'' statement that they will enter ''consecutive patients,'' and no current method is available for cross-checking the Mastery case log with facility case logs for completeness. Recognizing that nonconsecutive case entry (by non-QCDR surgeons) could alter surgeon performance rates, falsely elevating them, one investigation of Mastery compared the performance of a single quality indicator between QCDR-and non-QCDR-participating surgeons. 52 Performance did not differ, but this analysis has not been performed for any of the QMs described in this report. Surgeons also can elect to opt out of reporting QMs at any time. The percentage of surgeons who do so due to their perception of comparatively poor performance is unknown. If significant, this self-selected removal from the aggregate data would confound overall performance assessment, falsely elevating it.
Another limitation is our development of QMs by surgeons with minimal patient input and no payer input. As a result, we cannot rule out that these other stakeholders may have a perception of the quality of care delivered to them that differs from our perception. For example, patients might rank timeliness of care higher than we did, and payers of care might rank reoperations the highest, given its association with cost of care. We may not even be measuring some domains of care that are most important to patients because we did not uniformly query their values and preferences upfront during program development, as recommended by others. 2,54 See Table 4 for other limitations.

CONCLUSION
In summary, the ASBrS built a patient registry to audit condition-specific measures of breast surgical quality and subsequently provided peer comparison at the individual provider level, hoping to improve national performance. In 2016, we provided public transparency of the 2015 performance reported by our surgeon participants. 55,56 In doing so, we have become stewards, not bystanders, accepting the responsibility to improve patient care. We successfully captured more than 1 million patient-measure encounters, participated in CMS programs designed to link reimbursement to performance, and provided our surgeons with a method for satisfying American Board of Surgery Maintenance of Certification requirements. As public and private payers of care introduce new incentivized reimbursement programs, we are well prepared to participate with our ''tested'' breast-specific QMs.
ACKNOWLEDGMENT We thank Sharon Grutman for ASBrS Patient Safety and Quality Committee support, Mena Jalali for Mastery Workgroup support, Mastery Workgroup members (Linda Smith, Kathryn Wagner, Eric Brown, Regina Hampton, Thomas Kearney, Alison Laidley, and Jason Wilson) for QM vetting, Margaret and Ben Schlosnagle for quality measure programming support, Choua Vang for assistance in manuscript preparation, and the Gundersen Medical Foundation and the Norma J. Vinger Center for Breast Care for financial support. We especially thank Eric and Thomas Whitacre for Mastery program development.   Capability to use the program for ''plan-do-study-act'' cycles 52,53 No participation fee for members before 2016 a

Limitations
Peer performance comparison not yet risk-adjusted Unknown rate of nonconsecutive patient data entry No significant patient or payer input into quality measure list or ranking to reflect their preferences and values 54 Unknown rate of surgeon ''dropout'' due to their perception of poor performance a $100.00 began 2016