Introduction

Minimizing disability, inappropriate time off work, and their economic sequellae remain major goals of occupational medicine. Several studies have demonstrated the clinical and financial benefits of ergonomic, disability management, and return-to-work interventions [13]. In many states, workers’ compensation systems have adopted guidelines to prevent workers from receiving treatments that appear unnecessary, may delay return to work, or may even be harmful. However, less attention has been paid to ensuring that injured workers receive the basic, essential medical care processes involved in making a correct diagnosis, alleviating symptoms, and addressing activities and functional limitations. Better quality medical care would benefit both workers and employers. In one randomized controlled trial in Spain, improving medical care for musculoskeletal conditions reduced time on temporary disability by 37%, the percentage of temporarily disabled workers going onto permanent disability by 50%, and total costs (including disability and medical care) by 37% [4]. Given the potential benefits to workers and employers, several provider organizations and payers would like to see quality assessment and improvement activities become more routine in occupational medicine.

Carpal tunnel syndrome (CTS) should be a key target for such activities because it is prevalent and costly, and because there is indirect evidence of quality deficits. CTS affects three out of every 10,000 full-time workers [5]. For each workers’ compensation claim for CTS, employers pay a median of $1,468 to $11,941 (inflated to 2009), depending on whether surgery is performed [6, 7]. Each worker with CTS experiences a cumulative loss of future earnings equal to $45,000 to $89,000 [8].

For patients with CTS, diagnostic evaluations and non-operative management are highly variable, which may indicate care is of inconsistent quality. Recommended history and physical examination elements are performed inconsistently [9, 10], and physicians differ in the criteria they use to diagnose CTS [11]. This variability in care appears to affect when patients receive a CTS diagnosis and how long they stay off work. A Washington State study found that half of workers’ compensation claims for CTS were initially filed for other conditions, and 20% of the time the CTS was not diagnosed until more than three months into the claim. Later diagnoses were associated with longer disability [6].

To evaluate quality of care for occupational disorders like CTS, specific quality measures are needed. Process-oriented quality measures identify basic, well-established care processes that patients should or should not receive under specific circumstances. The purpose of such measures is not to advance the standard of care but rather to make existing standards explicit and measurable. Although guidelines and measures can both help to standardize and improve care, guidelines cannot be used to measure quality (other similarities and differences between guidelines and measures are explored below). For an occupational condition, a set of quality measures should consider both medical and occupational issues, such as whether a patient’s symptoms are associated with occupational activities and how occupational activities should be modified. Existing sets of measures, such as one set for back pain, often neglect occupational considerations [12].

The objective of this study was to develop a set of quality measures that can be used to objectively assess the quality of the diagnostic evaluation and therapeutic management of CTS, with an emphasis on issues specific to occupational settings. We developed these measures using a variation of the well-established RAND/UCLA Appropriateness Method. A particular strength of this method is that it considers available literature but is able to overcome gaps in research evidence by rigorously synthesizing the experience of expert clinicians [13]. Randomized controlled trials do not exist for most health care processes [14], including for many aspects of care for CTS [15]. In such circumstances, syntheses of clinical expertise are a valid and important form of evidence. This is demonstrated by the fact that, in several studies addressing a variety of conditions, better adherence to measures developed using RAND/UCLA Appropriateness Method has been associated with improved patient outcomes [1618].

Materials and Methods

Measure development is a three-step process: (1) developing draft measures by integrating guidelines and literature; and (2) refining and selecting measures, in this case using a variation of the RAND/UCLA panel method; and (3) testing the measures against a data source. We report the first two steps in this paper.

We also developed measures to assess the quality of electrodiagnostic testing [19], whether carpal tunnel release surgery is performed for appropriate indications [20], and the quality of peri-operative management; these measures are being reported elsewhere.

Developing Draft Measures

Developing draft measures was an iterative process involving collaboration among a rheumatologist, a physiatrist, two internists with expertise in quality measurement, and two hand surgeons, as well as a project advisory board that included five occupational medicine physicians.

First, we identified aspects of care relevant to improving quality for CTS (for example, the initial physical examination) using relevant clinical practice guidelines and other summary literature. We conducted a general literature search on CTS, updated a 2004 search for relevant guidelines by searching MEDLINE and the National Guidelines Clearinghouse, and accessed personal reference collections [21]. Team physicians reviewed the guidelines and literature, chose care processes that are likely to affect patient outcomes or that are widely recommended, then wrote draft measures.

Next, directed MEDLINE searches were conducted to identify evidence pertinent to the draft indicators. A reference librarian conducted the searches, and excluded case reports and animal studies. The searches included the terms carpal tunnel syndrome OR median neuropathy, with additional MeSH terms for specific subtopics: diagnosis (classification, severity, history, occupation, and tests), non-surgical treatment (therapy, drug therapy, rehabilitation), and return to work issues (disability, ergonomic, work). Team physicians sequentially reviewed titles, abstracts, and articles to assess relevance to each draft measure. Respectively, 1,635 citations were reviewed pertained to the diagnosis of CTS, 475 to non-surgical treatment, and 538 to return to work issues. Draft measures were refined, added, and deleted on the basis of search results.

Next, physicians summarized, for each draft measure, the evidence supporting the relationship between the care process and patient outcomes, emphasizing the highest quality evidence identified. Given most of the evidence was not high quality, we used a simplified classification scheme: level 1, randomized controlled trial; 2, observational study; and 3, case reports, case series and expert opinion. Where level 1 evidence was not available, the summary described a chain of evidence or clinical rationale.

Refining and Selecting Measures

Methods for refining and selecting quality measures were based on the RAND/UCLA Appropriateness Method, a multidisciplinary, two-round, modified-Delphi process that enables researchers to obtain a quantitative assessment that reflects the judgment of a group of experts. This method (explained below) has been used previously to develop quality measures for a wide variety of conditions and types of care. Additional background information and technical details about this method have been published previously [13, 22]. The method has reproducibility consistent with that of well-accepted diagnostic tests like screening mammography—i.e., separate panels examining the same topic have produced similar recommendations (kappas 0.51–0.83). Further, the measures developed using this method have been shown to have content, construct, and predictive validity, as evidenced by the fact that measures have been consistent with the results of subsequent randomized controlled trials or associated with improved patient outcomes. For example, panel judgments about the appropriateness of carotid endarterectomy were consistent with the findings of a subsequent randomized trial [23]. For arthroplasty of the knee and hip, adherence to measures addressing the appropriateness of surgery was found to be associated with improved quality of life [18]. For vulnerable elders, adherence to quality measures developed using this method was found to be associated with improved survival [16].

To select panelists for the current study, we asked U.S. specialty societies to recommend physicians who are leaders in each specialty, and then we reviewed curriculum vitae, interviewed candidates, and contacted references. The panel had eleven members: an occupational medicine physician, a neurologist, a physiatrist, a family physician, a physical therapist, four hand surgeons (one with primary board designation in plastic surgery and three in orthopedic surgery), and two orthopedists. We chose this balance of specialties because panelists rated many measures pertaining to carpal tunnel release surgery as well as the diagnostic evaluation and non-operative management. Panelists represented a variety of geographic locations, expertise, and both academic and community practice settings.

The first round of ratings involved having panelists rate the measures at home. Panelists received the evidence summaries, draft measures, ballots, and instructions. During the second round, panelists met in person and research team members moderated discussions of each draft measure, the evidence, and first-round ratings. We used a modified-Delphi panel method, rather than a consensus-panel method that forces agreement, to allow different attitudes to be expressed and contend with one another and true agreement or disagreement to emerge. Each panelist received a summary of the first-round ratings for each measure, including the median, standard error, his/her rating relative to the distribution, and the analytic interpretation. Panelists suggested modifications to definitions of key terms and measures; these were adopted when a majority voted to do so. After all opinions had been voiced for a measure, panelists marked private, equally weighted ballots.

For both rounds, panelists rated validity, feasibility, and importance on 9-point scales (9 = highest). Validity meant: (1) adequate scientific evidence or professional consensus exists to support a link between the performance of care specified by the measure and improved clinical outcomes; and (2) based on the panelists’ professional experience, health professionals with significantly higher rates of adherence to a measure would be considered higher-quality providers [13]. Panelists also rated measures for feasibility and importance to facilitate future users’ efforts to prioritize the measures. Feasibility meant the potential ability to evaluate adherence to the measure using medical records. Importance meant the magnitude of the potential effect on patient outcomes.

As is standard for this method, ratings interpretations included: valid = a median of 7–9 without disagreement; not valid = a median of 1–3 without disagreement; uncertain validity = a median of 4–6 or any median with disagreement. Disagreement was defined as three or more panelists rating in the 1–3 range and three or more in the 7–9 range [13]. Measures were considered potentially feasible if the median was 4 or above. There was no minimum threshold for importance because this variable was intended to help future users prioritize the measures.

Comparison with Occupational Medicine Guideline

An occupational medicine physician assessed how concordant each passing measure was with the current occupational medicine guideline from the American College of Occupational and Environmental Medicine (ACOEM) [24]. Observations were discussed with another physician who also compared the measures and guidelines.

Pilot Testing

After identifying measures meeting the validity and feasibility criteria, RAND/UCLA team members developed a detailed tool for scoring the measures. For each measure, an experienced research nurse and research associate defined relevant terms within the measures, the populations or care eligible for the measure (the denominator), and instances in which care can be considered to adhere to the measure (the numerator). Timeframes for eligibility and adherence were specified. The team also anticipated feasibility issues, such as data elements that may be difficult to find in medical records or that could require subjective judgments by abstractors, and developed specific instructions to resolve them.

Pilot testing enabled us to examine feasibility issues and preliminary rates of adherence to the measures. Feasibility issues included the ease which relevant patients can be identified, the availability of the medical records required to assess eligibility for and adherence to individual measures, and the clarity and usefulness of the scoring tool. The RAND/UCLA team pilot tested the measures and tool in a large workers’ compensation provider organization (Kaiser Permanente Northern California Regional Occupational Health) and in a large workers’ compensation insurance company (the California State Compensation Insurance Fund). Six nurses and one physical therapist (“abstractors”), who routinely perform claims reviews within each organization, underwent a two-day training in the use of the tool and scored several practice cases. Finally, they reviewed records for a small sample of patients who had been diagnosed CTS or conditions often confused with CTS. Patients were randomly selected by applying pre-specified criteria (time period and diagnostic category) to administrative databases maintained by the insurance company. The abstractors working for the insurance company reviewed clinical records routinely collected for claims processing. The abstractors working for the provider organization reviewed electronic medical records for each patient. During the training and pilot testing, abstractors provided feedback on the tool. The pilot test activities were approved by each of the institutional human subjects’ protection committees; informed consent was not required.

Results

There were 40 draft measures. During the second round of the rating process, 30 measures were modified, 9 measures did not meet validity criteria, one of these 9 was also judged infeasible, and the remainder passed (31/40 measures passed, 78%).

Final Measures

Nine final RAND/UCLA CTS measures (Table 1) emphasized the initial evaluation of patients with hand and forearm complaints; 11 considered non-operative treatments such as splinting, steroid injections, and other medications; and 11 pertained to addressing activities and functional limitations.

Table 1 List of quality measures meeting validity and feasibility criteria

Table 2 lists the title of each measure, validity and feasibility ratings, and the highest level of supporting evidence. For few, if any, of these measures was there a large randomized controlled trial or high-quality observational study directly examining the effect of the care described. Nevertheless, in each instance, there is convincing chain of evidence or clinical rationale that supports the practice. An “Appendix” provides the supporting rationale and a summary of the relevant literature.

Table 2 Quality measures: measure titles, ratings, and evidence level*

Comparison with Occupational Medicine Guideline

Seventeen measures (55%) are fully concordant with the ACOEM guideline, five are somewhat concordant (16%), the ACOEM guideline did not address content within eight of the measures (26%), and one measure is discordant with the guideline (3%) (see Appendix for list) [24]. This last measure addresses the use of non-steroidal anti-inflammatory agents (NSAIDs) for CTS symptoms.

Pilot Testing

Regarding feasibility issues, the provider organization readily identified eligible patients using ICD-9 and CPT codes and had no difficulty determining eligibility for and adherence to the measures due to the organization’s electronic medical record system. However, the insurance company had some difficulty identifying eligible patients because it uses broad diagnostic categories rather than ICD-9 and CPT codes, and also assessing eligibility for some measures because its clinical records were incomplete. As to the scoring tool, the research team made many changes based on feedback from the seven abstractors. None of the measures were eliminated due to feasibility concerns.

Regarding preliminary rates of adherence, the pilot study included a total of 28 unique patients. Sixteen had been diagnosed with CTS and 12 with upper extremity disorders commonly confused with CTS. Twenty-four patients were eligible for one or more measures. Care was eligible for a measure a total of 559 times, and adhered to the measures 419 times (an overall adherence rate of 75%). Adherence rates were 66% for initial evaluation, 79% for non-operative treatment, and 81% for management of activities and functional limitations. These results illustrate the ability to assess quality of care for CTS and should not be considered representative of the care provided by these organizations.

Discussion

This paper describes a set of measures that can be used to objectively assess the quality of medical care for carpal tunnel syndrome, with an emphasis on issues specific to occupational settings. The measures address the diagnostic evaluation and non-operative treatment of CTS, including assessing causality and managing occupational activities and functional limitations.

Quality measures that focus on care processes, as these do, are sometimes confused with treatment guidelines because they share development methods and clinical content. However, quality measures and guidelines serve complementary functions (see Table 3). Quality measures are rigid, quantitative tools that distinguish higher and lower quality care after the care has already been provided, whereas guidelines offer information that practitioners may or may not use during real-time clinical decision-making. Measures effectively become mandatory when adherence to them is used to assign penalties or rewards, as payers often do in non-occupational settings. Measures, for this reason, describe basic standards rather than best practices, are silent when the appropriate approaches are uncertain, and are used to assess quality at the population level. Conversely, guidelines are generally designed to be flexible and advisory; therefore, they cannot be accurately or reliably used as quality assessment tools because they permit providers to use their experience when applying recommendations to individual patients and address situations in which there is uncertainty about the preferred approaches. Finally, measures are scored in a systematic, highly structured fashion to ensure consistent results [25]. Thus, although occupational medicine guidelines exist for CTS [24], quality measures are also needed.

Table 3 Similarities and differences between process-oriented quality measures and clinical treatment guidelines

As noted in the Introduction, both payors and workers have substantial interests in improving the quality of care for CTS due to the high prevalence and costs associated with the condition. Two studies have demonstrated that quality improvement programs promoting adherence to treatment guidelines can decrease time off work and reduce costs. A randomized controlled trial in Spain demonstrated that improving care for workers with musculoskeletal injuries, including CTS, can markedly affect disability and its costs, saving eleven U.S. dollars per dollar invested [4]. A smaller Washington State program produced similar results: disability costs were reduced by 30% by improving adherence to treatment protocols and encouraging providers to prescribe activity and plan for return to work [26]. The savings could be even greater if the costs associated with reduced worker productivity were considered, since CTS is a common cause of absenteeism [27]. Thus, improving quality of care for occupational disorders may represent a unique “win–win” for workers and employers, the two central stakeholders in workers’ compensation systems.

Efforts to monitor and improve quality of care have already become commonplace in other aspects of the United States healthcare system. Most hospitals are now required to publicly report performance with regards to acute myocardial infarction, heart failure, and pneumonia [28]. The National Committee on Quality Assurance’s Healthcare Effectiveness Data and Information Set (HEDIS) enables health plans to monitor and report the quality of the care their enrollees receive. Because 90% of health plans participate in the HEDIS program and employers consider HEDIS scores in healthcare purchasing decisions [29], health plans have financial incentive to improve quality of care. Comparable efforts to assess and improve care could be undertaken for occupationally associated disorders.

Provider organizations, payors, and others planning to use these measures will need detailed specifications to score them consistently. The research team has developed and pilot tested a comprehensive scoring tool that will support these efforts. This tool includes all of the measures, including those pertaining to electrodiagnosis and surgery. RAND will make the refined, final tool available for free on its website during the summer of 2010. Provider organizations may be in a better position to identify eligible patients and assess quality than payors are. We found this to be the case in our pilot study. Further, in non-occupational settings, providers typically perform these functions and report quality of care data to payers (with oversight and validation activities to ensure the integrity of the data).

Comparison with Occupational Medicine Guideline

Overall, we found substantial concordance between the RAND/UCLA CTS measures and the ACOEM guideline, a major occupational medicine guideline, although there are no table differences. The RAND/UCLA measures disapprove of NSAIDs for CTS because a randomized controlled trial showed no benefits and these medications increase the risks of gastrointestinal bleeding and myocardial infarction [30, 31], whereas the ACOEM guideline considers NSAIDs to be an appropriate option. Also, the ACOEM guideline addresses many important topics that, for reasons discussed above, the measures omit.

For example, no measure defines the optimal method for establishing a diagnosis of CTS. Many studies, guidelines, and commentators have wrestled with this issue. Certain approaches to history taking and physical examination have higher specificities for CTS, using positive electrodiagnostic tests as the gold standard. In turn, positive electrodiagnostic tests increase the probability that patients will respond to surgery [15]. However, as of yet, there appears to be no clear consensus as to the “correct” approach to synthesizing this information into a clinical diagnosis. Consequently, the quality measures address the diagnostic evaluation for CTS, but not the diagnosis itself.

While the ACOEM guideline will be useful for informing providers of the preferred means of caring for patients with occupational CTS, the RAND/UCLA measures can be used to assess quality of care and monitor the effectiveness of any improvement efforts. Individual providers can use these measures to evaluate the quality of the care they provide. Periodic retrospective chart review is a central component of the occupational and preventive medicine maintenance-of-certification processes [32, 33]. The RAND/UCLA CTS measures could be used in such reviews. Practices with multiple providers can evaluate quality for the practice and, if warranted, develop an infrastructure that supports improvement. Organizational efforts are particularly likely to be effective because they leverage the contributions of many individuals, and they enable systems to be established that make adherence simpler. Finally, payors of compensation claims might consider using these measures as a basis for referring patients to higher-quality providers, or as a basis for offering higher-quality providers greater remuneration.

Limitations

Quality measures do have limitations. Some important aspects of care for patients with CTS are not amenable to measurement. For example, patients can be sensitive about discussing potential barriers to returning to work, such as conflicts with supervisors, and some providers may conduct these discussions more effectively than others do. But many important aspects of care can be measured. Also, for each measure, unique clinical circumstances will warrant exceptions to the rule. Justifiable exceptions are not problematical so long as sample sizes are sufficient and exceptions are rare and randomly distributed among populations of patients.

These measures also have specific limitations. First, the literature examining these practices is rather limited, and most of the measures are based on expert consensus. Musculoskeletal disorders suffer for a lack of large, high-quality randomized controlled trials, and randomized controlled trials are not feasible for all aspects of care. In the past, this panel method has successfully overcome similar limitations to the literature for osteoarthritis, rheumatoid arthritis, arthroplasty of the knee and hip, and many other clinical situations [18, 34, 35]. Second, the panel included a higher proportion of surgeons than it would have if only diagnosis and non-operative treatment were considered. To mitigate this issue, we submitted the measures for each topic to relevant subspeciality journals in occupational medicine, neurology, and surgery, thereby ensuring that the measures undergo peer review by experts in these respective disciplines.

Third, the ultimate test of measures’ validity entails assessing whether better adherence is associated with better patient outcomes. In September 2010, we are planning to undertake a prospective study that will compare adherence to these measures with patients’ symptoms, functional status, time off work, and permanent disability ratings. We expect to find an association because associations have been found for previous sets of measures developed using the same methods. However, most quality measures in wide use today have yet to be been tested in this fashion.

In conclusion, this project has developed 31 measures that can be used to evaluate the quality of the care for CTS. These measures appear to be the first quality measures to address both medical and occupational issues; therefore, they lay the groundwork for quality assessment activities to be introduced in occupational settings. These measures could be useful in a variety of efforts to improve quality of care for patients with CTS, whether initiated by providers, medical groups, payors, or policymakers. Similar measures should be developed for other work-associated disorders.