INTRODUCTION

Low-value care has been characterized as services that provide little to no benefit to patients, have potential to cause harm, incur unnecessary costs to patients, or waste limited healthcare resources.1,2,3 Efforts to minimize the use of low-value care are increasingly important to the public, providers, healthcare systems, and payers. The Choosing Wisely® campaign, in particular, has focused on identifying potentially unnecessary treatments, tests, and services that patients and physicians should question.4 In parallel, we have seen rapid growth in studies of interventions that target low-value care, and sponsors are now increasingly supporting funding in this area.5

Prior studies have examined the types, effectiveness, and quality of interventions to reduce the use of low-value services and the validity of specific measures used to assess low-value care.1,6,7 However, we have limited knowledge of how the effects of interventions to reduce unnecessary services are being measured (i.e., what types of intervention outcomes are being assessed?). To ensure that we are measuring outcomes that are clinically meaningful, and because complex interventions can have far-reaching and often unintended consequences, it is important that efforts to reduce low-value care are assessed in comprehensive and systematic ways.8,9

The purpose of this review was to characterize and examine what types of measures are routinely employed in studies of interventions designed to reduce low-value services. We hypothesized that existing studies largely focus on utilization rather than more clinically meaningful measures, such as appropriateness. We also hypothesized that unintended consequences (such as underuse of appropriate services) are not being systematically assessed, and that patient-reported measures are used infrequently. By presenting a framework for measuring the effects of interventions to decrease low-value services and elucidating the current status of intervention assessment, we sought to highlight current gaps and inform researchers and sponsors about what types of clinically meaningful measures should be incorporated in future studies.

METHODS

We performed a systematic review to identify studies that evaluated interventions to reduce low-value care. We adhered to the standards outlined by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement in conducting and reporting our systematic review (Online Appendix A).

Search Strategy

To identify potentially relevant studies, we searched PubMed, Web of Science, and ClinicalTrials.gov. Our search strategy was based on a previously published systematic review by Colla et al., of interventions used to decrease low-value care, which included publications through early 2015.6 We expanded the terms from this prior review with input from a medical librarian to ensure inclusion of all possibly relevant studies (Online Appendix B). In brief, we searched for articles that contained any of the key terms “health services misuse,” “unnecessary procedures,” “low value,” “waste,” “overu*,” or “wasteful,” in addition to any word belonging to one of nine sets of topical search term sets. These search terms sets contained multiple terms and synonyms related to possible intervention designs to reduce use of low-value care: (1) cost sharing and value-based purchasing, (2) patient education and decision-making, (3) quality indicators and reporting, (4) physician performance incentives, (5) utilization management strategies, (6) financial risk sharing and physician reimbursement, (7) clinical decision support, (8) provider education, and (9) provider feedback and peer reporting. We focused on studies published or initiated after the 2010 introduction of the “Top Five List” (a precursor to Choosing Wisely®),10 restricting our search from January 1, 2010, to December 31, 2016. The search was not restricted by language.

Study Selection Criteria

Three authors (SL, MK, JM) reviewed titles and abstracts identified by the search strategy to select potentially relevant studies (Fig. 1). Studies were included if they presented primary research that examined the effect of an intervention to reduce low-value care. As previously mentioned, low-value care was defined as services that provide little to no benefit to patients, have the potential to cause harm, incur unnecessary costs to patients, or waste limited healthcare resources.1,2,3 Studies were excluded if they examined cost reduction only. For studies that met inclusion criteria, full-text articles of published studies (from PubMed and Web of Science) and protocols for ongoing studies (from ClinicalTrials.gov) were retrieved and reviewed as described below. Our partners at AcademyHealth also reviewed our search strategy and recommended inclusion of two additional studies that were not identified in our initial literature search.11,12 After full-text review of these articles to ensure that they met inclusion criteria, they were also included in this systematic review.

Figure 1
figure 1

Flow diagram of search results. *WOS Web of Science. De-prescribing, de-intensification, unnecessary (lab, test, imaging, utilization), performance measurement, and behavioral economics. 2 articles identified as published studies in clinicaltrials.gov, 2 articles identified by AcademyHealth from the Choosing Wisely Campaign.

Development of Measure Categorization Framework

We developed a rubric to categorize measure type. We first reviewed recommendations and literature from measurement organizations including National Quality Forum13 and the American College of Physicians,3 recent systematic reviews1,6,14 of studies on low-value care, and recently published work8,9 on proposed approaches to measuring overuse and low-value care. From this body of literature, we developed a comprehensive framework to capture the wide range of effects—both intended and unintended—of interventions to reduce use of low-value care. While we recognized that many studies might simply focus on reductions in utilization, we deliberately sought to incorporate categories for measures that are clinically meaningful and patient-centered. For example, a study designed to decrease the use of antibiotic prescriptions for viral infections could measure, in addition to the reduction in overall use, whether prescribing was reduced for the right patients (appropriateness), whether patients in the intervention arm had fewer allergic reactions (outcomes), patient satisfaction with the encounter (patient-reported experiences), and whether there were any increases in hospital admissions for patients who did not receive antibiotics (unintended outcomes). Additional examples of measure categorization may be found in Online Appendix F. We further categorized each measure according to whether it assessed unintended consequences of the intervention (e.g., underuse of services, substitution of services, patient-reported experiences, provider-reported experiences, or patient-provider interactions). The resulting measure categorization framework is presented in Table 1.

Table 1 Framework for Measure Categorization

Data Extraction and Analysis: Measures

Four physicians (EK, SS, JM, SB) reviewed full-text articles to extract data using a standardized data extraction form. We extracted measure specifications (count, scale, or proportion), measure type (using the framework described above), and whether or not the measure assessed unintended consequences (Table 1). Unintended consequences were categorized as “definite” if they were explicitly stated as such (or a synonymous term or concept clearly identified it as such) in the methods section of the study, or “possible” if they were not specifically stated as such by the study authors, but measurement of an unintended consequence was inferred by reviewers. For example, if a study designed to decrease use of antibiotics for viral infections also examined number of subsequent emergency department (ED) visits, we coded ED visits as a measure of unintended consequences even if the authors did not specifically label it as such. Duplicate review was performed on all measures. Discrepancies were resolved by discussion to reach consensus.

Data Extraction and Analysis: Study Characteristics

Three authors (SL, MK, WF) also independently extracted data from each full-text article that met inclusion criteria, to identify study characteristics. We extracted the following study characteristics: publication year, setting (e.g., inpatient, outpatient), study population, clinical function (screening, prevention, treatment, or diagnostic testing and surveillance), type of service(s) which were intended to be reduced (medications, labs/pathology, imaging, procedures, cost, other), who initiated the intervention (payer, delivery system, other), whether randomization was used, and whether a control group was employed. We also extracted information regarding the intervention target (e.g., patient, provider, system) and mechanism (e.g., cost sharing, pay for performance, insurance payment policy). Finally, we extracted funding source, if any, from all published studies. Any uncertainty about study characteristics was resolved through discussion to reach consensus. As prior systematic reviews performed quality assessments on many of the studies included in our review, we did not include a quality assessment and chose to focus primarily on the characteristics of measures within these studies.

RESULTS

Results from Published Studies

Characteristics of Studies

We identified 1311 original manuscripts from PubMed and Web of Science. Including the additional two manuscripts that were identified by our partners in AcademyHealth and two that were identified from completed studies in ClinicalTrials.gov, our search strategy resulted in 1315 original manuscripts (Fig. 1). After reviewing these by title and abstract, we excluded 1214 due to irrelevant study topic, no identifiable intervention and/or low-value care targeted, manuscript comprising either a systematic review or meta-analysis, or duplicate study. In total, 101 manuscripts met inclusion criteria.

These 101 studies were conducted in a variety of practice settings, including inpatient only (42%), outpatient only (32%), and other or multiple settings (27%). Few studies (30%) employed an external control group, and only 19% used randomization. Clinical functions included screening (12%), prevention (7%), treatment (47%), and diagnostic testing or surveillance (53%), with some studies covering more than one domain. The low-value services targeted in these studies included laboratory testing (including pathology) (34%), medication use (32%), imaging tests (26%), cost of care (15%), and medical or surgical procedures (14%) (Table 2). Forty-eight studies (48%) were externally funded (e.g., through grants). Twenty-five of these 48 studies (52%) received federal funding (US or non-US). A complete list of studies included in this review is provided in Online Appendix C, and additional details for each study are provided in Online Appendix D.

Table 2 Characteristics of Publisheda and Ongoingb Studies

Characteristics of Interventions

Most interventions targeted providers, with 60% of studies using education or guidelines, 44% using clinical decision support tools, 42% using feedback mechanisms (e.g., report cards), and 3% using pay-for-performance. A small number of studies (13%) comprised interventions that focused on patients, of which the majority were education-related. Finally, a minority of studies (5%) included interventions that targeted payers (e.g., changes to insurance or payment policy).

Characteristics of Measures Within Studies

Most studies used at least one measure of utilization or ordering, while fewer used measures of appropriateness, outcomes, or cost. Patient reports (including PREMs and PROMs), provider reports, measures of the patient-provider interaction, and measures of value (e.g., cost-effectiveness) were used infrequently (Table 3). A summary of measures from each published study and specific examples of measures from selected studies can be found in Online Appendix E and Online Appendix F, respectively.

Table 3 Summary of Measures from Publisheda and Ongoingb Studies

Utilization Measures

Sixty-nine studies (68%) included a measure of utilization of care, including use of antibiotics/medications, laboratory testing, imaging tests, medical procedures, consultation of specialty services, screening tests or procedures, and transfusion of blood products. These were largely collected as overall proportions and rates of service utilization across units or clinics, often determined pre- and post-intervention, without considering whether the service was indicated in specific circumstances. For example, due to the rising rate of antibiotic resistance among the neonatal population, Nitsch-Osuch and colleagues studied the impact of a hospital antibiotic policy on overall antibiotic use in the neonatal intensive care unit (ICU).15 Similarly, given the overuse of laboratory testing, Procop and colleagues compared the impact of two clinical decision support tools (“Hard Stop,” which required a call to the laboratory to provide justification for ordering a duplicate test, and “Smart Alert,” which was simply a notification to the ordering provider that a duplicate test was ordered) by measuring the number of duplicate laboratory tests placed.16

Appropriateness Measures

Fifty-three studies (52%) included a measure of appropriateness. These measures assessed appropriate and inappropriate use, including overuse of a variety of services and procedures among specific patients for whom the service was not indicated. For example, several studies examined overuse of medications among patients not requiring treatment, such as antibiotics prescribed for patients with viral upper respiratory infections or asymptomatic bacteriuria, or acid suppressive therapy among low-risk inpatients.17,18,19 Other studies examined inappropriate use of other treatments (e.g., transfusions of red blood cells in patients with hematocrit greater than 21% or fresh frozen plasma among patients who did not have a prolonged INR and/or active bleeding), imaging modalities (e.g., MRI for back pain in patients without red flags, or neuroimaging for uncomplicated headaches), and medical procedures (e.g., rate of cesarean sections in low-risk births).20,21,22,23 Finally, appropriateness measures also included rates of adherence to guidelines. For example, Kost and colleagues assessed clinical decisions made by primary care physicians before and after the implementation of Choosing Wisely® in common scenarios such as antibiotic prescribing for acute sinusitis or use of dual-energy x-ray absorptiometry (DEXA) for bone loss among low-risk women.24

Outcome Measures

Forty-one studies (41%) included an outcome measure. These included length of stay, hospitalization, ICU admission, hospital readmission, and mortality. For example, Algaze and colleagues found that a computerized order entry rule reduced laboratory utilization but did not affect hospital length of stay or mortality among pediatric cardiovascular ICU patients.25 Outcome measures also included adherence to recommendations, delayed care, and treatment failures. For example, outpatient parenteral antibiotic therapy has been shown to be overused among patients with a variety of infectious diseases despite national guidelines.26 Due to ongoing overuse of such therapy, Conant and colleagues investigated the impact of mandatory infectious disease approval for outpatient parenteral antibiotic therapy. The authors found a low rate of treatment failures among patients for whom authorization of outpatient therapy was denied.27

Cost-Related Measures

Thirty-six studies (36%) assessed the impact of interventions on costs of care. Measures included cost related to medications, inpatient admissions, laboratory tests, procedures, specialty services, imaging, and blood product transfusions. For example, in one study, global payment contracts between payer and provider organizations were found to decrease spending for cardiovascular services and imaging.28 In another study, a computerized integrated antibiotic authorization system reduced expenditure related to antibiotic overuse.29

Patient-Reported Measures

Eight studies (8%) included patient-reported measures, such as PREMs and PROMs. Examples of patient-reported measures included patient satisfaction and quality of life.30,31 Funded studies (N = 48) were more likely to use patient-reported measures than unfunded studies (17% versus 0%).

Other Measures

Three studies (3%) included provider-reported measures, including physicians’ intention to follow practice guidelines, physicians’ feedback and experiences related to an intervention, and self-reported comfort level with procedures such as hysteroscopy.32,33,34,35 Only one study included a measure of value (cost-effectiveness of alternative coronary heart disease diagnostic testing strategies)30, and one study included a measure of patient-provider interaction.22

Measures of Unintended Consequences Within Studies

A total of 34 studies included at least one measure of an unintended consequence. Across these 34 studies, 75 out of the total 349 (21%) measures that we identified in our review assessed unintended consequences including measures of appropriateness, utilization, outcomes, and patient-reported experiences. Fifteen studies included a total of 30 measures that we categorized as “definite” measures of unintended consequences (i.e., stated explicitly in the study), and the remaining 19 studies included a total of 45 measures that we categorized as “possible” measures of unintended consequences (i.e., inferred by the reviewer).

Of the 75 measures, the majority (87%) assessed outcomes (e.g., length of stay or mortality). The remaining assessed utilization or ordering (7%), appropriateness (3%), and patient-reported outcomes and experiences (4%). No studies used measures of provider-reported experiences, patient-provider interactions, value, or costs to assess unintended consequences (Table 4).

Table 4 Measures of Unintended Consequences from Publisheda and Ongoingb Studies

Results from Ongoing Studies

In addition to the studies identified from PubMed and Web of Science, we identified 490 potentially relevant studies from ClinicalTrials.gov (Fig. 1) of which 16 met inclusion criteria for review (Online Appendix C). These studies were conducted in a variety of practice settings including inpatient only (44%), outpatient only (31%), and other or multiple settings (25%). Seventy-five percent of the studies employed a control group and 81% of the studies used randomization. Clinical functions included screening (6%), prevention (6%), treatment (88%), and diagnostic testing or surveillance (25%), with some studies covering more than one domain. Low-value services that were targeted include medications (75%), laboratory testing (including pathology) (13%), imaging (13%), and procedures (6%) (Table 2). All of the studies used interventions targeting providers, and 4 studies used interventions targeting patients. None of these ongoing studies targeted payers.

Twelve studies (75%) included at least one outcome measure, and 11 (69%) included at least one utilization measure. These were followed by 8 studies (50%) with an appropriateness measure, 8 studies (50%) with a patient-reported measure, 3 (19%) with a cost measure, 2 (13%) with a provider-reported measure, and 1 (6%) with a patient-provider interaction measure (Table 3). None of the studies included a measure of value.

Twenty-two out of 103 measures (21%) assessed unintended consequences, with 10 studies (63%) measuring at least one unintended consequence. Three were “definite” measures of unintended consequences while 19 were “possible.” Of the 22 measures, the majority (77%) assessed unintended consequences related to outcomes. The remainder assessed utilization or ordering (9%) or were patient-reported measures (14%). No studies used measures of appropriateness, cost, provider-reports, patient-provider interaction, or value to assess unintended consequences (Table 4).

DISCUSSION

In this systematic review, we found that across 101 studies of interventions to reduce low-value care, more than two-thirds focused on rates of utilization. About half of the studies examined changes in appropriateness of services, a more clinically meaningful measure for studies intending to reduce low-value care. Studies rarely used patient-reported measures, provider-reported measures, or measures of patient-provider interactions. Funded studies were more likely to use patient-reported measures (17% vs 0%). Finally, only one-third of studies included a measure of unintended consequences. Even then, many of these unintended consequences assessed readily available but rare outcomes such as mortality.

Among the published intervention studies, less than one-third used randomization or a control group. Perhaps unsurprisingly, ongoing clinical trials registered in ClinicalTrials.gov were more likely to use randomization (81% vs 19%) or an external control group (75% vs 30%) than published manuscripts (which included not only clinical trials, but also quality improvement studies). Like the published literature, most of the ongoing trials used measures of utilization (69%), but a greater proportion used outcome measures (75%) and patient-reported measures (50%). Provider-reported and patient-provider interaction measures were used rarely. Unlike published studies, over half (63%) of the ongoing trials included measures of unintended consequences. However, the sample size for ongoing studies was small (N = 16), limiting our ability to draw robust conclusions.

These findings highlight critical gaps in the way we measure the effects of interventions to reduce low-value care. As we have discussed, such interventions are often complex, comprising multiple components that are tested in active healthcare delivery contexts which include a broad array of stakeholders. As a result, these interventions can have unexpected and (in some instances) unintended effects on clinical processes and outcomes as well as patient and provider experiences and outcomes. It is therefore imperative that the evaluations of these interventions consider broader sets of measures to assess the interventions’ effects, both positive and negative.

To this end, we present a comprehensive and systematic framework to assess measures that researchers may apply to their studies to reduce low-value care. In particular, researchers should incorporate more patient-centered measures to ensure that the right services are being reduced in the right patients, that patient-provider relationships are assessed, and that outcomes are improved (or do not worsen). Measures that focus on appropriateness, patient reports, and clinically meaningful outcomes that the intervention is adequately powered to assess are particularly important. In addition, unintended consequences need to be assessed routinely to ensure that interventions are reducing low-value care without promoting harm. In our review, unintended consequences were assessed in only a small proportion of published studies. Moreover, in both published and ongoing studies, most were measures of rare outcomes (e.g., mortality) for which the study was likely underpowered. Studies infrequently assessed unintended effects of processes of care and intermediate outcomes, such as underuse of recommended services, care location shift, increasing use of an alternative test or treatment (substitution), damage to the patient-physician relationship, or patient or provider dissatisfaction.7,8

Strengths of our study include a comprehensive literature search of both published and ongoing studies and development of a framework to characterize measures. Despite efforts to clearly define measure categories, variation in classification among the reviewers was expected. However, we did conduct a duplicate review of all measures for both published and ongoing studies with discrepancies resolved through discussion.

There are several limitations to our study. First, we did not perform a quality assessment of the studies themselves. However, a quality assessment that included many of our studies was recently published by Colla and colleagues.6 Second, despite a comprehensive literature search, there remains a possibility that we may have missed relevant studies, but it is likely that the number of omissions was small. Finally, data from the ongoing, unpublished studies should be interpreted with caution given the small number and the limited details that were available in the study protocols. In addition, the measures that were reported in ClinicalTrials.gov may or may not ultimately be used in the published evaluation of these studies.

In conclusion, our findings confirm our hypotheses that (1) existing studies largely focus on utilization rather than more clinically meaningful measures, such as appropriateness, (2) unintended consequences are often not systematically assessed, and (3) patient-reported measures are used infrequently. Study designers and evaluators should explicitly incorporate more clinically meaningful and patient-centered measures into study designs. Additionally, researchers and funding agencies should develop standardized guidance for study designs of interventions to reduce low-value care. Finally, editors and reviewers should request that patient-centered measures be included in evaluations (or their absence be clearly mentioned in limitations). As we seek to develop and test increasingly complex interventions to reduce low-value care, we must be sure to comprehensively assess the effects of these interventions on the totality of clinical care and the patient experience.