Introduction

The assessments of lumbar spine disorders have been based on biological, physiological, and anatomical outcomes, such as measurements of the range of spinal motion, laboratory tests, and imaging studies.1 However, these indicators have little meaning for the patient and the society. On the other hand, alleviation of symptoms, such as pain intensity, and an improved quality of life (QOL) have more significance for the patients and the society. It has been reported that patient self-rated measures of symptom intensity and QOL are as reproducible as many physiological measurements and are acceptable with respect to objectivity and stability.2 Thus, patient-based outcomes involving patient self-assessment of symptom intensity and QOL should be used in clinical research.

Conventionally, surgery is evaluated based on a simple four-grade scale: excellent, good, fair, and poor. This approach has limitations due to its subjectivity and the lack of clear definitions for each grade. Therefore, the evaluation of treatment results depends on individual researchers and is not fully comparable. Furthermore, the four-grade scale is not sufficient to measure pain intensity, activities of daily living, or the ability to work. For example, a patient might not be able to return to work despite a decrease in pain, or there may be no alleviation of pain intensity despite an improvement in activities of daily living. Given such circumstances, an improvement in one dimension does not necessarily mean an improvement in other dimensions; thus, the evaluation of medical treatments must be multidimensional and include patient-based outcomes. Given these perspectives, the Assessment Standards Committee prepared this report dealing with the new standards for evaluating the results of treatments for lumbar spine disorders.

Materials and methods

Selection of lumbar spine disorders evaluation items

The aim of this study was to establish a multidimensional method for evaluating treatment results for lumbar spine disorders that was centered on patient-based outcomes and that could be used internationally. Pain intensity can be measured using a visual analogue scale (VAS) and the NASS questionnaire.3,4 The Roland-Morris disability questionnaire (RDQ) and the Oswestry disability questionnaire are low back painspecific QOL questionnaires.5,6 With respect to the RDQ, the Japanese version of the RDQ has been developed that conforms to the psychometric standards in the areas of reliability, validity, and responsiveness.7 Both alleviation of patients’ symptoms and its impact on their activities of daily living can be measured using the RDQ. Widely used international measures for general well-being include the SF36, SF12, and the Euro QOL; Japanese versions of the SF36 and the Euro QOL have been developed.811 Thus, it would be desirable to use the VAS for measuring the intensity of low back pain, the RDQ for measuring low back pain-specific QOL, and the SF-36 for assessing general well-being. However, the evaluation of all of these items in daily practice is impractical owing to the large number of items. The approximate time to complete the RDQ is 5 min, and it takes 10 min to complete the SF36.1 To reduce the number of items necessary to evaluate the efficacy of treatments for lumbar spine disorders, the usefulness of various evaluation criteria to differentiate patients with lumbar spine disorders from normal subjects was studied.

Examination of the evaluation rating score (true value) in the lumbar spine disorders group

Eight institutions (including affiliated institutions) were asked to recruit at least 40 subjects during the period from February to May 2002. The questionnaire consisted of a total of 60 items: 24 items derived from the Japanese version of the RDQ and 36 items derived from the Japanese version of the SF-36. Lumbar disc herniation and lumbar canal stenosis were the two main targets. Subjects who had other orthopedic disorders and those with impaired ability to understand the questions, such as patients with dementia, were excluded. Normal subjects were defined as adults with no orthopedic disorders. Adults living independently and not requiring nursing care but who were undergoing alternative treatments (e.g., acupuncture, moxibustion, massage, and chiropractic treatments) were included in the control group. Health care professionals were excluded.

Prior to conducting the investigations, subjects in the patient group and the control group gave their written informed consent.

Background characteristics of the patient group

The distribution of subjects’ background characteristics, such as age, diagnosis, Japanese Orthopaedic Association (JOA) score, and finger to floor distance, was analyzed to verify that the group represents the general population of patients with spine disorders.

Examination of removable candidate QOL items

A QOL item could be removed if it satisfied any of the following criteria: (1) items to which most subjects gave the same answer; (2) items the answers for which were highly correlated with the answers to other questions; (3) items that could be explained by several questions; (4) items whose score distributions did not show any statistically significant differences between the patient and control groups.

To test for the above conditions, the distribution of responses for the RDQ and SF36 were compared between the two groups. The correlation coefficient for each question in the patient group was analyzed using the Spearman correlation coefficient.

Examination of the identification rate by discrimination analysis of candidate items

After using the above-described criteria to identify the candidate items to be included in the final questionnaire, discrimination analysis was done to eliminate further the number of items. By setting one of the candidate items for adoption as the objective variable, the rest of the items were examined as explanatory variables; the discrimination rate was then analyzed, and items with a minimum discrimination rate ≥70% were considered to be items that could be excluded. The final items that were excluded were determined by examining the explanatory variables, which were selected after discrimination analysis, setting the candidate to-beexcluded items as the objective variable.

Results

Background characteristics of the patient group

Table 1 shows the age, sex, and diagnosis of 328 subjects in the patient group and 213 subjects in the control group. There was significant difference in sex and age distribution between the two groups (P = 0.03, Fisher's exact test). In the patient group, the straight leg raising (SLR) test was positive in approximately 40%, sensory disturbance was present in 60%, muscle weakness was seen in 40%, and bladder dysfunction was impaired in approximately 10% of the subjects (Table 2). The distribution of the finger to floor distance revealed that the mobility of the lumbar spine in the patient group was significantly restricted compared to that of the control group. Although we cannot make any conclusions, given the above results we considered that the patient group represented the general population of the patients with lumbar spine disorders.

Table 1 Demogrphics of pateints and controls
Table 2 Clinical findings

RDQ

The nonresponse rate for the RDQ was less than 5% for all questions; no questions were difficult to answer. As expected, more than 95% of the normal subjects answered “no” to all questions. In the patient group, more than 80% of respondents chose the same answers for items 15, 19, and 21; and approximately 80% chose the same answer for items 3 and 23. In particular, for items 15 and 19, more than 80% of the patient group chose the same answer (no) as the normal healthy subjects (Table 3). Therefore, based on these results, RDQ-15 and RDQ-19 were listed as candidates to be excluded.

Table 3 Results of the RDQ (Roland-Morris Disability Questionnaire)

SF-36

The nonresponse rate for the SF-36 was less than 5% for all questions, and none of the questions was difficult to answer. There was a statistically significant difference in the distribution of responses between the patient group and the control group (P < 0.05, by χ 2). Furthermore, there were no questions for which the answers were predominantly concentrated on one choice in the patient group.

Correlation coefficient for each question in the patient group

For the 24 RDQ items, there were mutual correlations between two groups of item s: RDQ-1, 3, 10, 17, 21, and 23 (six items); and RDQ-4, 7, 9, 12, 16, and 24 (six items). For the SF-36 items, there were mutual correlations among four groups: QOL-1,2,11a,11b, and 11d (5 items); QOL-3a-3j (10 items); QOL-4a–4d, 5a–5c, 6, 7, 8, and 10 (11 items); QOL-9a-9i (9 items). Thus, 33 items were excluded, and 27 remained as candidates for adoption. The reasons for exclusion are shown in Table 4.

Table 4 Exclusion and adoption of items (first level)

Discrimination analysis

The discrimination rate of the answer for each item, based on the discrimination analysis, was determined for the 27 candidates for adoption. To arrive at the discrimination rate, one item was set as the objective variable, and the other items were set as explanatory variables; the item with a high minimum value for the discrimination rate was excluded from adoption. The minimum value for the discrimination rate was > 70% in four items (RDQ-1, 5, 14, and 16) (Table 5). The discrimination rate calculated the ratio that the answers of patients group accorded with the estimated answers by classification rule. To compute the κ value, we made a contingency table using the answers of patients group and by the estimated answers.

Table 5 Results of discrimination analysis

Adoption of the explanatory variables in discrimination analysis

To verify whether it would be appropriate to exclude RDQ-1, 5, 14, and 16, the explanatory variable chosen for each objective variable in discrimination analysis was determined (Table 6). Consequently, it was found that if RDQ-1 and 5 were excluded RDQ-14 and 16 could not be excluded because RDQ-1 and 5 would be necessary. Given these results, 25 of the 27 candidate items for adoption were adopted; RDQ-1 and 5 were excluded.

Table 6 Explanatory variable chosen for every objective variable on discrimination analysis

Discussion

Several issues must be considered when creating a new evaluation method for medical treatments. First, the evaluation should be structured so the effect of medical intervention is accurately reflected. If medical treatment results are mainly determined by genetic or environmental factors, the quality of the treatment cannot be evaluated. Second, the evaluation of medical treatment results must contain a framework that accurately and reliably captures changes in the patient's health condition. Finally, to evaluate the medical treatment results accurately, the treatment evaluation period should be the same as the time period during which information is obtained about the patients’ complications and social background that can affect the medical treatment outcomes.

Evaluation of medical treatment outcomes used to be a subject of concern for health care professionals only. Recently, however, the evaluation of medical treatment outcomes is becoming more of a concern to patients and governments who pay the medical costs. Evaluating medical treatment results is key to assessing cost effectiveness and to validating treatments themselves. Thus, criteria used for the evaluations should be objective and structured in such a way that the patients’ perspective is respected. In this way, the results can be understood not only by health care professionals but also by patients and third parties. Evaluation of medical treatment based on the creation of standards can be used to document and improve the performance of the medical system and health care technology.

This study has several limitations. There was a significant difference in sex and age between the patients group and the normal group. Hence, there is a possibility that this difference affects the results of our study. For many research purposes, it may be optimal to include both disease-specific (RDQ) and generic functional status measures (SF-36). However, an instrument that includes both disease-specific and general functional status measures has not been established. Although it may not be ideal to combine items from two different instruments, our final goal was to find the disease-specific daily functions, physical function, role function, pain, vitality, mental health, and health perception. However, only the Japanese version of the RDQ as the disease-specific and the Japanese version of the SF-36 as the generic functional status measure were available. Therefore, we combined items from the two instruments to find the disease-specific functional status that included many dimensions.

We were able to identify 25 specific questions that would elucidate the QOL of patients with various lumbar spine disorders. The next step is to assess the validity and responsiveness of the questionnaire that includes the selected 25 questions by measuring the outcome of patients with lumbar spinal disorders. Also, we have to complete cross-cultural adaptation of the BPEQ so it can be used internationally.