Cervical degenerative index: a new quantitative radiographic scoring system for cervical spondylosis with interobserver and intraobserver reliability testing

Background The lack of a widely available scoring system for cervical degenerative spondylosis encouraged the authors to establish and validate a systematic quantitative radiographic index. Materials and methods This study included intraobserver and interobserver reliability testing among three reviewers with different years of experience. Each observer independently scored four cervical radiographs of 48 patients at separate intervals, and statistical analysis of the grading was performed. Results There was high intraobserver and interobserver reliability between the two experienced observers. There was fair reliability between the less experienced observer and the more experienced observers. Conclusions The cervical degenerative index appears to be a reliable and reproducible radiographic assessment of cervical spondylosis. The index will have direct applicability for longitudinal study of cervical spondylosis and may be clinically relevant as well.


Introduction
When patients are being evaluated for neck pain, plain radiographs are typically obtained prior to three-dimensional imaging. Often the films will reveal cervical spondylosis. Patients frequently will question whether the radiographs show significant degenerative findings (as well as if they exclude tumor or infection). When being seen in long-term follow-up of either nonoperative or operative management, they often wish to know whether the underlying cervical spondylosis has progressed or remained stable.
Having a simple method that is both quantitative and qualitative to assess the magnitude of cervical spondylosis would also be useful in research studies. In a review of the cervical spine in patients who had undergone long fusions of the thoracolumbar spine for scoliosis, we found a need for such a reproducible method. This led us to establish and validate such an index.

Review of the literature
Qualitative radiographic evaluation of degeneration of the cervical spine has been studied in the normal population. Kellgren et al. [1,2] developed a set of criteria to classify degenerative spondylosis based upon lateral cervical spine radiographs in a normal population sample. The classification was a five-grade scale, ranging from ''0,'' for the absence of degeneration, up to ''4'' for severe narrowing of the disc space with sclerosis and large osteophytes. Grade 0 represents absence of disc degeneration, Grade I represents minimal anterior osteophytosis, Grade II represents definite anterior osteophytosis with a possible narrowing of the disc space and some sclerosis of the vertebral plates, Grade III represents a moderate narrowing of the disc space and definite sclerosis of the vertebral plates and osteophytosis, and Grade IV represents severe narrowing of the disc space, sclerosis of the vertebral plates, and multiple large osteophytosis.
Cote et al. [3] evaluated the reliability of this classification system in patients having neck pain and reported it to be a reliable tool for research purposes.
Gore et al. [4] expanded the grading criteria for degenerative change to include three parameters: disc space narrowing, endplate sclerosis, and anterior/posterior osteophytes [3]. Grade 0 was no disc space narrowing, no end-plate sclerosis, and no osteophyte formation. Grade I was a 25% decrease in disc space narrowing, with barely visible end-plate sclerosis and osteophyte formation. Grade II was a 50% decrease in disc space narrowing with moderate end-plate sclerosis and a moderate size osteophyte formation. Grade III was a 75% disc space narrowing, severe end-plate sclerosis, and large osteophyte formation.
To the best of our knowledge, a detailed quantitative assessment of cervical spondylosis has not been published to date. Our study of the natural history of cervical spine in patients with scoliosis led us to establish and validate this quantitative radiographic index.

Materials
The study has been performed according to the Declaration of Helsinki and was approved by the Ethical Committee. Enrolled patients gave informed consent.
In a separate Institutional Review Board (IRB)approved study of the cervical spine in patients who had a long fusion from the thoracic spine to the sacrum for scoliosis, we had obtained routine cervical radiographs (AP, lateral, and lateral flexion and extension) in 48 patients, average age 56 years. The upper end vertebra of the scoliosis fusion level was at T2 or T3 in 44% of patients, T4 or T5 in 36%, and T6-T10 in 20% of patients. The current analysis is based on these radiographs. The incidence and severity of changes in these patients is the subject of another study [5].
Exclusion criteria included congenital cervical anomalies, trauma, prior cervical surgery, rheumatoid arthritis, infections, tumors, ankylosing spondylitis, ossification of the posterior longitudinal ligament (OPLL), diffuse idiopathic skeletal hyperostosis (DISH), and any other inflammatory disease involving the cervical spine.

Index development, rating, and scoring
In developing this quantitative cervical degenerative index (CDI), we expanded upon the three radiographic criteria reported by Gore [4]. The CDI includes the three factors in Gore's original evaluation (disc space narrowing, endplate sclerosis, and osteophyte formation), and a fourth factor, olisthesis, either anterior or posterior. The assessment is based upon a standard four-view (AP, lateral, flexion, and extension) cervical radiographic series. A quantitative score for each of the four factors is summed to achieve the final score for the CDI.
Each of the four factors being assessed for degenerative radiographic appearance is graded on each level, from C2-C3 through C6-C7, on a four-point scale ranging from 0 to 3. For each factor, a normal appearance yields a score of ''0,'' with the most severe spondylytic change yielding a score of ''3.'' Each factor (except sclerosis) has specific quantitative criteria as outlined in Table 1. Thus, a lower score represents a more normal-appearing radiograph, and a higher numerical value represents more degenerative spondylytic change on the radiograph (Figs. 1, 2, 3, 4).
During the pilot phase of this study, it became evident that in many patients there existed large anterior osteophytes as well as posterior osteophytes that would impact the assessment of the presence or absence of spondylosis at that given segment. When scoring for the index, we quantitatively evaluated and recorded both anterior and posterior osteophytes separately, but we utilized the largest value (i.e., worst degenerative change) of the two to calculate the final score. We also noted a number of patients with significant facet joint sclerosis, as well as endplate sclerosis. Since this was to be an assessment of the ''segments'' sclerosis, we similarly used the highest numerical value for the sclerosis of either the endplates of the segment or of the facet joints (Fig. 5). The CDI includes scoring by factor (i.e., disc space narrowing, endplate/facet sclerosis, osteophyte formation, and presence/absence of an olisthesis), by level (i.e., each segment of C2-3 through C6-7), and an overall cumulative score. Factor scores have a potential range of 0 (normal) to 15 (most severe) as a result of adding up each individual's score for a particular factor for the five segments. Similarly, level scores range from 0 (normal) to 12 (most spondylytic) as based on summing each of the four factors for a given level. The overall CDI is thus calculated as the sum of the factor scores at all levels (equivalent to the sum of the level scores across all factors), resulting in a possible CDI score ranging from 0 (completely nonspondylytic appearance) to 60 (most severe degeneration at each level).
Three reviewers (a spine fellow and two staff spine attending surgeons with 8 and 16 years in practice) reviewed each of the radiographs in a blinded and independent fashion. CDI assessment forms were completed and submitted to the research department. The films were cleaned and returned to the surgeons for a second assessment with a minimum of 2 weeks between evaluations. Access to the first assessment or to other reader's assessments was at no time available during the review process.

Statistical methodology
Analysis of variance models and intraclass correlation coefficients (ICC) were used to assess intraobserver and interobserver reliability. Factor scores, level scores, and CDI were analyzed. All analyses were performed using SPSS 10.0 (Chicago, IL). Fig. 1 In this patient, we see the different grading scales for disc space narrowing rated by one observer as C2/3 = 0, C6/7 = 1, C4/5 = 2, C5/6 = 2, and C3/4 = 3 Fig. 2 Severe endplate sclerosis is seen in a, and severe sclerosis of the facets at C2/3 without severe endplate sclerosis in b. Both were scored as 3 points

Results
The descriptive statistics for factor scores and the summation of all five segments are presented in Table 2. No single patient was rated at a given level for the maximum of 15 points available by any of the raters, indicating that the range of the scales is adequate without obvious ceiling effect. The mean sclerosis rating was 5.8, disc space narrowing 5.0, osteophytes 3.3, and listhesis 2.2 ( Table 2). Table 3 (summation of factor scores for a given level) showed the C5-C6 level to have the highest radiographic appearance of spondylosis (4.5 ± 2.8) while the C2-C3 level showed the lowest score (0.9 ± 1.3). The average CDI was 16.4 (± 10.1), with a minimum of 0 and a maximum of 46 (Table 3).

Intraobserver and interobserver reliability
The intraobserver reliability for total score is measured by the intraclass correlation coefficient (ICC) and was excellent for the two senior observers at 0.89 and 0.87, while for the less experienced observer, it was fair at 0.45. The ICC remained excellent at each level, except for C2-C3, likely due to the fact that it was more often normal-appearing, such that a changing score (i.e., on average 0.6-0.7) had a higher percentage change for that level (Table 4).    The interobserver reliability between the two experienced raters was excellent at an ICC of 0.86. However, the overall ICC of the three raters was only 0.58-in the upper end of the fair category-with the ICC between raters 1 and 3 and 1 and 2 being only 0.50 and 0.58, respectively. This correlation reflects the non-experienced observer's consistency. There was a trend toward a correlation between the level of experience and higher scores, and the less experienced observer tended to give higher scores on initial evaluation (Tables 5, 6).

Discussion
Cervical spondylosis is a generic term for the degenerative cascade that may affect the entire cervical spine and may be seen radiographically in both symptomatic and asymptomatic individuals. It encompasses a sequence of degenerative changes that often begin in the intervertebral disc space and may lead to changes in the surrounding bony anatomy and soft tissues [6,7]. Katz et al. [8] proposed that a number of pathological processes cause spondylosis and they lead to vertebral ''endplate sclerosis.'' Lee et al. [9] in a study of radiographic density and the sagittal diameter of the cervical spine evaluated the thickness in the area of the endplate at the C5 level in 200 patients. They concluded that ''endplate sclerosis'' does not correlate with symptoms and, therefore, has little value as a sign on cervical radiographs. Our decision to evaluate facet sclerosis is predicated upon a general appreciation of synovial joints as to the deterioration of articular cartilage being capable of producing clinical symptoms. This is further bolstered by the initial protocols on intervertebral   disc replacement, which suggest that facet joint arthrosis needs to be evaluated, and that if significant degenerative change exists, then replacement is relatively contraindicated. In a biomechanical and imaging study of human lumbar cadaveric spine, Fugiwara et al. has shown a relationship between facet joint sclerosis, osteoarthrosis, and segmental instability [10]. We added olisthesis as a dynamic component, since a patient with obvious clinical instability on flexion/extension views suggests to us a greater degenerative process. This factor is based upon the fact that clinically relevant instability does lead to changes in surgical recommendations. The natural history of cervical spondylosis is associated with the aging process [4,11]. Neurological symptoms and signs can develop and often are related to the cause and time course of anatomic compression and to the structures that are being compressed. Although some question the value of plain radiographs [12,13], the usual patient evaluation is to obtain plain films prior to more advanced imaging.
In a longitudinal study of the natural history of the cervical spine in those with scoliosis, we found the previously reported criteria to be insufficient. Thus, we expanded upon the criteria and set the goal of this CDI to be a quantitative scoring system by level and across the cervical spine. The higher the numerical value, the greater the degenerative radiographic appearance, allowing longitudinal comparison.
In assessing our choice of four radiographic factors, three are specifically quantitative: disc space narrowing, osteophyte formation, and listhesis. The fourth, sclerosis, is qualitative (converted to a numeric value); however, as a factor it had good intraobserver reliability. The more quantitative factors had even higher reproducibility amongst the experienced observers.
In order to have detailed radiographic information for each patient, we utilized a four-view cervical radiographic series, which is our standard clinical practice. Although in this group there were only a few cases in which the flexion/extension views changed the CDI, it is recognized that mechanical instability is a particularly clinically relevant finding.
The results of the study show a high intra-and interobserver reliability between the two experienced clinicians. We found the CDI to be reliable and applicable to the radiographic assessment of cervical degenerative change. While our focus was to have a quantitative research method to assess the natural history of spondylytic change, we believe that a quantitative assessment also has clinical applicability. As with any assessment tool, there is a learning curve that is influenced by the experience of the observer, but this CDI does appear to be a simple and reproducible index.
In summary, the innovation of the CDI is that it gives a detailed quantitative radiographic assessment of spondylitic change per cervical spine level, per independent factor, and a total for the entire cervical spine.
Limitations of this study include the relatively small number of observers, the use of a highly selected group of patients (all with adult scoliosis severe enough to require extensive surgery), and no correlation with clinical symptoms. It is our plan to subsequently do a clinical correlation.