Measurement of the outcome is critical for any decision-making and results of evaluations in all medical circumstances. This should be applicable when managing patients who have lumbar spine-related problems. The Japan Orthopaedic Association (JOA) developed and published a specific instrument to measure outcomes for patients with low back problems in 1986.1 It was called the JOA score rating system for low back pain, with a full score being 29 points. Since then, the instrument has been widely utilized to evaluate the functional results of many types of intervention for patients with such problems. It has been referred to not only in articles by Japanese investigators2 but also in those by non- Japanese-speaking investigators.3,4 One of the major criticisms of this specific instrument, however, is that it is not a patient-oriented measurement but a physician-based one. It is now widely accepted that a patient’s perspective is essential for making medical decisions and for evaluating the results of interventions.5 Based on the current needs for measuring outcome, the JOA was urged to revise its original score rating system and to develop a new one. In 2002, a Subcommittee on Evaluation of Back Pain and Cervical Myelopathy was organized in the Clinical Outcome Committee of JOA, and work began on revising the original JOA scoring system.

This revision process consisted of four steps: Parts 1 to 4. As described in the previous literature concerning Part 1, the original JOA scoring system was revised and a new scoring system (the JOA Back Pain Evaluation Questionnaire — JOABPEQ) was developed.6 The key points of this revision were to make the original JOA score more patient-oriented. For the survey in the Part 1 study, we first created a preliminary questionnaire consisting of 60 items. The questionnaire was a self-administered, disease-specific measure that was created with reference to the Japanese editions of the short form health survey with 36 questions (SF-36)7 and the Roland-Morris Disability Questionnaire (RDQ)8 to assess health-related quality of life. From the survey, a total of 25 items were selected for tentative use on a draft of the JOABPEQ (Table 1).

Table 1 Items (n = 25) selected for the draft of the JOABPEQ evaluated in this study

The purpose of the Part 2 study in this project was to evaluate the reliability of the 25 items selected for the draft JOABPEQ; for this, test-retest reliability was ascertained.

Materials and methods

Recruitment of patients

Altogether, 460 of the 829 Japanese board-certified spine surgeons were randomly selected, and each was asked to recruit two patients to evaluate the JOABPEQ between January and June 2004. The recruited patients were scheduled to reply to the questionnaire twice at a 2-week interval. Patient criteria were as follows: (1) patients could be any age of either sex; (2) patients had any lumbar spine disorder and were currently visiting an outpatient clinic; (3) the severity of the symptoms was expected to be at the same level between the two interviews. Exclusion criteria were the presence of: (1) other musculoskeletal diseases requiring medical treatment; (2) psychiatric disease (e.g., dementia), potentially leading to inappropriate answers; (3) a postoperative condition; (4) having participated in previous surveys of the related study.

Testing the questionnaire

Each patient was asked to complete the same questionnaire twice at an interval of 2 weeks (±3 days). The attending surgeon filled out the patient information on the diagnosis and the presence or absence of concomitant diseases, followed by judging the severity of symptoms using a three-step rating scale (mild, moderate, severe). Symptom severity was determined subjectively by the attending surgeon, who was asked not to select a similar patient solely on the basis of severity. Patients who had the same level of severity as judged by all surgeons were then selected and analyzed to verify the reliability of the questionnaire.

This study was approved by the Ethics Committee of the Japanese Society for Spine Surgery and Related Research. Informed consent was obtained from each patient.

The reliability of the questionnaire was evaluated by determining the extension of the kappa coefficients. The weighted kappa coefficient was calculated in the items with three choices or more. The kappa and weighted kappa coefficients were calculated based on a formula using Microsoft Office Excel 2003. Kappa and weighted kappa coefficients of 0.4 or above were judged to be reliable.9 The 95% confidence intervals (95% CI) were calculated for all reliability coefficients using the bootstrap method.


Patient characteristics

A total of 350 patients participated in this study and completed the questionnaire twice following the project’s plan. However, 135 patients were excluded because the severity of their symptoms had changed between the two interviews or they violated the interval period. Of the remaining 215 patients, 54 were ineligible because of other musculoskeletal diseases, such as knee and hip osteoarthrosis. As a result, a total of 161 patients were available for the analysis in this study: 86 men and 75 women with a mean age of 57.7 years (SD 16.3 years). The clinical diagnosis included degenerative lumbar canal stenosis in 49 patients, lumbar disc herniation in 44, spondylolisthesis in 20, spondylosis in 16, degenerative disc disease in 13, mechanical low back pain in 11, and miscellaneous in 8. The patients’ age varied from their twenties to their eighties, and symptom severity varied from mild to severe (Table 2). Neurological and physical status was evaluated for each patient using the current JOA score rating system and finger-floor distance (Table 3). Neurological deficits varied from mild to severe, and trunk flexibility varied among the subjects as well.

Table 2 Distribution of age and severity of symptoms in the patient analyzed (n = 161)
Table 3 Current Japanese Orthopaedic Association score rating system and fi nger to fl oor distance for the patients analyzed (n = 161)

Face validity

Face validity was checked in terms of the completion rate for filling out the questionnaire. The distribution of the answers for all question items was then checked to ensure that there were no biased answers. Items remaining unanswered accounted for less than 5% in the first test, and there was no skewed distribution, such as “floor and ceiling” effects (Table 4).

Table 4 Reproducibility of each item (n = 161)


The test-retest reliability was confirmed by calculating the kappa and weighted kappa coefficients for each item (Tables 5A, 5B). Both kappa and weighted kappa were more than 0.50 in all items, except in one item with 0.48. The lower 95% CI exceeded 0.4 in all items, except in two items with 0.39. This implied that the test-retest reliability of JOABPEQ was acceptable as a measurement of outcome.

Table 5A Kappa coefficient with 95% CI for items Q1-1 to Q1-14
Table 5B Weighted kappa coefficient with 95% CI for items Q2-1 to Q2-11


Measurement of the outcome is generally divided into two categories: generic and disease-specific measures.5,10 SF-36 has been commonly used as representative of a measurement of generic health status.5,7,10 The RDQ and the Oswestry Disability Index are widely used as disease-specific measurements for back pain.8,11 The JOA score rating system for low back pain, developed in 1986, was also a disease-specific measuring instrument for back disorders and injuries and has been widely utilized in clinical research and the decision-making process in Japan. However, this is not a patient-based outcome measure reliable enough to describe the objective status of the function and quality of life (QOL) of patients with low-back disorders. There has, to date, been insufficient psychometric analysis to confirm the validity and reliability of this JOA score rating system.

The project for developing the new questionnaire, JOABPEQ, was initiated to create a self-administered, disease-specific method for measuring low back pain. This instrument should include functions of the lumbar spine as well as health-related QOL. The reliability of the questionnaire that includes the 25 suggested items was evaluated using psychometric analysis as Part 2 of this project. Kappa and weighted kappa coefficient were utilized to verify the test-retest reliability.12,13

In terms of external validity, biased data were inevitable because one criterion that was included was that the severity of the symptoms was expected to be at the same level between the two interviews. However, there was no bias on the choices of answer to each question. This implies that test-retest reliability was acceptable even if the subjects had symptoms of different severity. The older the patients were, the worse was the interpretation of each question. There were small numbers of patients of younger generations, such as those in their thirties and forties, in this study. Thus, the reliability would not deteriorate even if the number of young people were to increase.

In terms of English expression, there is a possibility of ambiguity in questions 1–2 and 1–11, where double negatives (two “no’s” in the answer) may be confusing. It is necessary to reconsider and revise the English expression so it is more easily understood by native English-language users. The number of choices for the answer in all questions varied from two to five, which is also a point to be reconsidered in the future.

The current study demonstrated that the 25 items had enough reliability to describe the QOL in patients suffering low back disorders. However, further studies are needed to complete the project, including a factor analysis to determine the underlying cluster of the questionnaire items, a formula for calculating the severity score, and confirmation of the responsiveness to the questionnaire.


The tentative JOABPEQ with 25 items was confirmed to be reliable enough to describe the QOL of patients suffering low back disorders.