European Spine Journal

, Volume 24, Supplement 2, pp 236–251

Predictors of outcome in patients with degenerative cervical spondylotic myelopathy undergoing surgical treatment: results of a systematic review

  • Lindsay A. Tetreault
  • Alina Karpova
  • Michael G. Fehlings
Open AccessReview Article

DOI: 10.1007/s00586-013-2658-z

Cite this article as:
Tetreault, L.A., Karpova, A. & Fehlings, M.G. Eur Spine J (2015) 24(Suppl 2): 236. doi:10.1007/s00586-013-2658-z

Abstract

Purpose

To conduct a systematic review of the literature to determine important clinical predictors of surgical outcome in patients with cervical spondylotic myelopathy (CSM).

Methods

A literature search was performed using MEDLINE, MEDLINE in Process, EMBASE and Cochrane Database of Systematic Reviews. Selected articles were evaluated using a 14-point modified SIGN scale and classified as either poor (<7), good (7–9) or excellent (10–14) quality of evidence. For each study, the association between various clinical factors and surgical outcome, evaluated by the (modified) Japanese Orthopaedic Association scale (mJOA/JOA), Nurick score or other measures, was defined. The results from the EXCELLENT studies were compared to the combined results from the EXCELLENT and GOOD studies which were compared to the results from all the studies.

Results

The initial search yielded 1,677 citations. Ninety-one of these articles, including three translated from Japanese, met the inclusion and exclusion criteria and were graded. Of these, 16 were excellent, 38 were good and 37 were poor quality. Based on the excellent studies alone, a longer duration of symptoms was associated with a poorer outcome evaluated on both the mJOA/JOA scale and Nurick score. A more severe baseline score was related with a worse outcome only on the mJOA/JOA scale. Based on the GOOD and EXCELLENT studies, duration of symptoms and baseline severity score were consistent predictors of mJOA/JOA, but not Nurick. Age was an insignificant predictor of outcome on any of the functional outcomes considered.

Conclusion

The most important predictors of outcome were preoperative severity and duration of symptoms. This review also identified many other valuable predictors including signs, symptoms, comorbidities and smoking status.

Keywords

Cervical spondylotic myelopathy Clinical predictors Surgical outcome Systematic review 

Introduction

Cervical spondylotic myelopathy (CSM) is the most common cause of spinal cord dysfunction worldwide. The disease is caused by the degeneration of various components of the vertebra, including the vertebral body, the intervertebral disk, the supporting ligaments and the facet joints [1]. Static factors, including the protrusion of osteophytic spurs (spondylosis), disk desiccation, ossification of the posterior longitudinal ligament (OPLL) and hypertrophy of the ligamentum flavum, may lead to the narrowing of the spinal canal and to cord compression [2]. Longstanding compression of the spinal cord can result in irreversible damage including demyelination and necrosis of the gray matter. The onset of CSM is generally insidious and progresses in a stepwise fashion [3, 4]. Upon diagnosis of symptomatic CSM, a physician often recommends surgical treatment to decompress the spinal cord [5]. Surgery has proven to be an effective intervention for the full range of myelopathy severity [6].

Given that CSM is a prevalent cause of spinal cord injury, and since surgery is often an appropriate intervention, it would be useful to identify the most important predictors of surgical outcome. Prediction is a valuable tool in a clinical setting. Knowing a patient’s surgical outcome can help determine which patients are most likely to benefit from surgery and help assess their degree of functional improvement [7]. This allows surgeons to provide valuable prognostic information to concerned patients, helping to manage expectations, as well as implement and direct appropriate treatment programs.

Holly et al. [8] conducted a similar systematic review of the literature and found that the most common predictors of surgical outcome for patients with CSM were age, duration of symptoms and severity of myelopathy. These three clinical factors are most frequently reported in the literature. Controversy still remains as to the significance, strength and direction of the relationship between surgical outcome and age, duration of symptoms and baseline severity.

The objective of this paper is to conduct a comprehensive literature search to determine the most important clinical predictors of outcome in surgical CSM-patients. This paper will address whether age, duration of symptoms, baseline severity score are indeed predictors and will also examine other clinical factors including comorbidities, smoking status, signs and symptoms to determine their predictive value.

Materials and methods

A literature search was performed using MEDLINE, MEDLINE in Process, EMBASE and Cochrane Central Register of Controlled Trials. The keywords used for the search were Cervical Spondylotic Myelopathy AND Surgery or Postoperative AND Prediction/Prognosis AND observational studies. The search was limited to humans, aged 18 years or older. The total number of citations found for this review was 1,677.

Articles were included if they were observational studies on patients >18 years with degenerative cervical myelopathy, treated surgically and followed postoperatively. Articles must have either directly or indirectly assessed the ability of a clinical factor to predict surgical outcome. Articles were eliminated if they were review articles or opinions; studies on patients with traumatic spinal cord injuries, thoracic myelopathy, radiculopathy, or non-degenerative cervical myelopathy; studies assessing only radiographic factors as predictors and studies that used complications as an outcome measure. Articles that were not in English or Japanese were excluded. Japanese articles were translated by Dr. Iwasaki and were included in the analysis.

All 1,677 abstracts and titles were reviewed independently by two authors (LAT, AK) and were sorted based on pre-determined inclusion and exclusion criteria. Figure 1 displays the search and review process in detail. Ninety-one articles were included. Three of these were translated from Japanese to English. Each article was assessed for quality with respect to methodology and overall structure. Several rating scales were examined, including Altman [9], Hayden et al. [10], and the Scottish Inter-Collegiate Guidelines Network (SIGN) scale for prognostic studies [11]. A modified version of the SIGN scale was used to rate the articles.
Fig. 1

Search strategy and detailed review process. CSM cervical spondylotic myelopathy

A modified version of the SIGN scoring system was implemented in a systematic review published by Kalsi-Ryan and Verrier [12]. Since the incidence of spinal cord injury is comparatively low, high quality research in this field is challenging. Studies often have small sample sizes with no opportunity for blinded assessment and randomization. Kalsi-Ryan and Verrier modified SIGN so that it was more specific to the nature of literature they were reviewing. We selected the modified SIGN system and further altered it to increase its applicability to literature reporting clinical predictors of surgical outcome in patients with CSM. Questions 15 and 16 were changed from dichotomous scoring to trichotomous scoring as studies may vary greatly in quality of statistical analysis, methodology and bias elimination (Table 1). It was arbitrarily decided that an article whose score was <7 would be classified as POOR, 7–9 as GOOD and 10–14 as EXCELLENT. The results from the EXCELLENT studies were compared to the combined results from the EXCELLENT and GOOD studies which were compared to the results from all the studies.
Table 1

Modified Scottish Inter-collegiate Guidelines Network (SIGN) used to rate all articles

Checklist item original

Original scoring

Dichotomous scoring

Comments/interpretations

1. The study addresses an appropriate and clearly focused question

6-point scale

Yes = 1

No = 0

 

2. The two groups being studied are selected from source populations that are comparable in all respects other than the factor under investigation

6-point scale

 

3. The study indicates how many people were asked to take part did so, in each of the groups being studied

6-point scale

Yes = 1

No = 0

Retrospective studies receive a score of zero

4. The likelihood that some eligible subjects might have the outcome at the time of enrolment is assessed and taken into account in the analysis

6-point scale

 

5. What percent (Did any) of individuals or clusters recruited into each arm of the study dropped out before the study was completed?

 %

Yes = 0

No = 1

Retrospective studies receive a score of zero unless they commented on drop off rate (>80 %). Prospective studies with a >80 % follow-up receive a 1

6. Comparison is made between full participants and those lost to follow-up, by exposure status

6-point scale

 

7. The outcomes are clearly defined

6-point scale

Yes = 1

No = 0

 

8. The assessment of outcome is made blind to exposure status

6-point scale

 

9. Where blinding was not possible, there is some recognition that knowledge of exposure status could have influenced the assessment of outcome

6-point scale

 

10. The measure of assessment of exposure is reliable

6-point scale

Yes = 1

No = 0

 

11. Evidence from other sources is used to demonstrate that the method of outcome assessment is valid and reliable

6-point scale

Yes = 1

No = 0

 

12. Exposure level or prognostic factor is assessed more than once

6-point scale

Yes = 1

No = 0

 

13. The main potential confounders are identified and taken into account in the design and analysis

6-point scale

Yes = 1

No = 0

The main confounders are age, duration of symptoms and baseline severity score

14. Have confidence intervals been provided?

6-point scale

Yes = 1

No = 0

Have the results been reported using good statistical methods?

15. How well was the study done to minimize the risk of bias or confounders, and to establish a causal relationship between exposure and effect?

Code ++, + or -

Yes = 1

No = 0

Trichotomous ratinga:

 0 = poor

 1 = good

 2 = excellent

16. Taking into account clinical considerations, your evaluation of the methodology used, and the statistical power of the study, are you certain that the overall effect is due to the exposure being investigated?

Yes/No

Yes = 1

No = 0

Trichotomous ratingb:

 0 = poor

 1 = good

 2 = excellent

17. Are the results of this study directly applicable to the patient group targeted in this guideline?

Yes/No

Yes = 1

No = 0

 

Checklist items in bold are those removed from the original checklist

aQuestion 15: To score a perfect 2, the study must be a prospective study with no selection and recruitment bias, must have a follow-up rate >80 % and must have controlled for confounders. A prospective study that met some, but not all of this criteria scores a 1. A retrospective study that has controlled for confounders also scores a 1. A highly biased retrospective or prospective study that has no control receives a score of 0

bQuestion 16: A study has sufficient statistical power if, for every predictor evaluated, there are at least 10 participants. Based on the score in question 15, a study can either go up, go down one point or stay the same depending on whether its sample size meets this basic rule

For each study, the association between various clinical factors and surgical outcome, evaluated by the (modified) Japanese Orthopaedic Association scale (mJOA/JOA), Nurick score or “other” measures, was extracted. A relationship between the outcome and predictor was defined as conditional, if it was significant for certain groups of patients but not others or using one statistical test, but not another.

Results

This review consisted of 37 POOR [13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49], 38 GOOD [50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87] and 16 EXCELLENT [88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103] articles (Table 2).
Table 2

List of EXCELLENT, GOOD and POOR studies included in systematic review

Rating

Number

Articles

Outcome

Excellent (10–14)

16

Cheng et al. (2009)

Furlan et al. (2011)

King et al. (2005)

Rajshekhar and Kumar (2005)

Suzuki et al. (2009)

Zhang et al. (2011)

Chibbaro et al. (2006)

Hasegawa et al. (2002)

Morio et al. (2001)

Shin et al. (2010)

Tanaka et al. (1999)

Ebersold et al. (1995)

Kim et al. (2008)

Nagashima et al. (2011)

Suri et al. (2003)

Zhang et al. (2010)

Nurick: 5

mJOA/JOA: 12

Cooper: 1

Good (7–9)

38

Alafifi et al. (2007)

Choi et al. (2005)

Fujimura et al. (1998)

Hamanishi et al. (1996)

Hirai (1991)

Hukuda et al. (1985)

Kawaguchi et al. (2003)

Koyanagi et al. (1993)

Masaki et al. (2007)

Nagata et al. (1996)

Park et al. (2006)

Singh et al. (2001)

Wiberg (1986)

Chen et al. (2009)

Chung et al. (2002)

Fujiwara (1987)

Handa et al. (2002)

Holly et al. (2008)

Kadanka et al. (2005),

Kiris and Kilincer (2008)

Lu et al. (2008)

Mastronardi et al. (2007)

Ogawa et al. (2004)

Satomi et al. (2001)

Uchida et al. (2005)

Wohlert et al. (1984)

Chen et al. (2001)

Emery et al. (1998)

Guidetti and Fortuna (1969)

Heidecke et al. (2000)

Huang et al. (2003)

Kato et al. (1998)

Koc et al. (2004)

Lyu et al. (2004)

Nagashima et al. (2006)

Okada et al. (1993)

Singh et al. (2009)

Wada et al. (1999)

Nurick: 5

mJOA/JOA: 27

Walking Test: 3

EMS: 1

NCSS: 1

Other: 3

Poor (<7)

37

Agrawal et al. (2004)

Arnold et al. (1993)

Chagas et al. (2005)

Gok et al. (2009)

Houten and Cooper (2003)

Iwasaki et al. (2002)

Kim et al. (2007)

Magnaes and Hauge (1980)

Moussa et al. (1983)

Naruse et al. (2009)

Saunders et al. (1991)

Sunago (1982)

Yamazaki et al. (2003)

Ahn et al. (2010)

Bertalanffy and Eggert (1998)

Chiles et al. (1999)

Gregorius et al. (1976)

Iencean (2007)

Iwasaki et al. (2007)

Kumar et al. (1999)

Matsuda et al. (1999)

Naderi et al. (1998)

Phillips (1973)

Scardino et al. (2010)

Wang et al. (2003)

Arnasson et al. (1987)

Bishara (1971)

Fessler et al. (1998)

Hamburger et al. (1997)

Igarashi et al. (2011)

Kawaguchi et al. (2000)

Lee et al. (1997)

Matsuoka et al. (2001)

Naderi et al. (1996)

Ryu et al. (2010)

Sinha and Jagetia (2010)

Wang et al. (2004)

Nurick: 10

mJOA/JOA: 18

NCSS: 1

SF-36: 1

Lees & Turner: 2

Other: 7

Italicized articles were the ones translated from Japanese to English

mJOA/JOA (modified) Japanese Orthopaedic Association scale, EMS European Myelopathy Score, NCSS Neurosurgical Cervical Spine Scale, SF-36 Short Form-36

Fifteen of these studies controlled for confounding variables when looking at the association between outcome and age, duration of symptoms and baseline severity score. The outcome measure used in all EXCELLENT studies was either the Nurick or the mJOA/JOA, with one study commenting on both.

Duration of symptoms

Thirteen articles evaluated duration of symptoms as a predictor of surgical outcome. Nine reported a negative, three a non-significant and one a conditional relationship. It is evident that outcome, assessed on both the mJOA/JOA and Nurick scale, is dependent on preoperative duration of symptoms as indicated by significantly more articles reporting a negative association than a non-significant association. The R values for this negative relationship ranged from weak to strong [95] (Table 3).
Table 3

Evaluation of duration of symptoms, age and baseline severity score as predictors of surgical outcome by excellent, GOOD + EXCELLENT and POOR + GOOD + EXCELLENT studies

Factor

Total (Significant/NS)

Significant(+,−)

Conditional

Significance by scale

Strength of association

Conclusions

EXCELLENT

 

 Duration of Symptoms

13(9, 3) → ([90, 94, 95, 97, 99, 100, 101, 102, 103], [89, 91, 98])

9(0, 9) → (0, [90, 94, 95, 97, 99, 100, 101, 102, 103])

1→[93]

Nurick: 3

mJOA/JOA: 6

Other: 1

R = −0.231, p = 0.05 [95]

R = −0.53, p = 0.04 (diabetic*) [93]

Longer duration of symptoms preoperatively result in a worse surgical outcome on all scales

 Age

16(6, 8) → ([88, 94, 95, 99, 102, 103], [89, 90, 92, 96, 97, 98, 100, 101])

6(0, 6) → (0, [88, 94, 95, 99, 102, 103])

2 → [91, 93]

Nurick: 3

mJOA/JOA: 5

Other: None

R = −0.38, p = 0.0031 [95]

R = −0.52, p < 0.01 (Nurick) [91]

R = −0.43, p < 0.01 (mJOA) [91]

Age is a potential predictor of outcome evaluated using the Nurick score or the mJOA/JOA

 Baseline Severity Score

9(8,1) → ([89, 94, 95, 97, 98, 101, 102, 103], [90])

8(7, 1) → ([86, 91, 92, 95, 98, 99, 100], [94])

Nurick: 2

mJOA/JOA: 7

Other: None

R = 0.61, p = 0.0001 [95]

R = −0.65, p < 0.05 (diabetic*) [93]

More severe preoperative myelopathy results in a worse outcome on the mJOA/JOA scale. The association between Nurick and baseline severity score is unclear

GOOD + EXCELLENT

 Duration of Symptoms

39(25, 10) → ([56, 57, 58, 60, 63, 69, 72, 73, 75, 77, 78, 79, 83, 85, 86, 87, 90, 94, 95, 97, 99, 100, 101, 102, 103], [51, 53, 55, 62, 64, 68, 84, 89, 91, 98])

25(0, 25) → (0, [56, 57, 58, 60, 63, 69, 72, 73, 75, 77, 78, 79, 83, 85, 86, 87, 90, 94, 95, 97, 99, 100, 101, 102, 103])

4 → [54, 59, 76, 93]

Nurick: 4

mJOA/JOA: 21

Other: 5

R = −0.47, p = 0.000039 [56]

R = −0.57, p = 0.0001 (sub-acute*) [58]

R = −0.82, p = 0.0001 (insidious onset*) [58]

R = −0.60, p < 0.01 (CSM*) [69]

R = −0.64, p < 0.01 (OPLL*) [69]

R = −0.231, p = 0.05 [95]

R = −0.463, p = NA (OPLL*) [77]

R = −0.401, p = NA (CSM*) [77]

R = −0.459, p = 0.0039 [78]

R = −0.364, p = NA [83]

R = −0.401, p = 0.043 (elderly*) [59]

R = a0.225, p = 0.041 [59]

R = −0.53, p = 0.04 (diabetic*) [93]

Longer duration of symptoms results in worse surgical outcome, evaluated on both the mJOA/JOA and other measures. The association between duration of symptoms and outcome on Nurick was inconclusive

 Age

50(17, 27) → ([52, 56, 61, 65, 71, 72, 75, 79, 81, 86, 87, 88, 94, 95, 99, 102, 103], [51, 53, 54, 58, 59, 60, 62, 63, 64, 66, 68, 70, 73, 77, 78, 80, 83, 84, 85, 89, 90, 92, 96, 97, 98, 100, 101]

17(16, 1) → ([52, 56, 61, 65, 71, 72, 75, 79, 86, 87, 88, 94, 95, 99, 102, 103], [81])

6 → [57, 69, 74, 76, 91, 93]

Nurick: 3

mJOA/JOA: 19

Other: 2

R = −0.28, p = 0.000046 [56]

R = −0.443, p = 0.005 [71]

R = −0.384, p = 0.0031 [95]

R = 0.23, p = 0.047 [81]

R = −0.52, p < 0.01 (Nurick**) [91]

R = −0.43, p < 0.01 (mJOA**) [91]

R = −0.65, p = 0.01 (diabetes*) [93]

R = −0.37, p < 0.05 (OPLL*) [69]

R = −0.46, p < 0.05 (CHD*) [69]

Age is not a significant predictor of outcome using any measure

 Baseline severity score

38(22, 12) → ([50, 52, 54, 55, 62, 63, 64, 65, 67, 68, 80, 81, 82, 86, 89, 94, 95, 97, 98, 101, 102, 103], [51, 56, 58, 71, 72, 77, 78, 79, 83, 84, 85, 90])

22(17, 5) → ([50, 52, 54, 55, 63, 65, 67, 68, 82, 86, 89, 94, 95, 98, 101, 102, 103], [62, 64, 80, 81, 97])

4 → [53, 57, 59, 76]

Nurick: 7

mJOA/JOA: 18

Other: 3

R = 0.65, p < 0.001 (MDI**) [81]

R = 0.64, p < 0.001 [55]

R = 0.37, p = 0.018 [67]

R = 0.69, p = 0.009 [68]

R = 0.611, p < 0.0001 [95]

R = 0.8412, p = NA (CSM*) [82]

R = 0.9261, p = NA (OPLL*) [82]

R = 0.387, p = 0.015 (younger*) [59]

R = 0.333, p = 0.009 [59]

More severe preoperative myelopathy is associated with a worse outcome on mJOA/JOA scale. Baseline severity is significantly related to outcome on the Nurick, but the direction of this association is unclear. The association between baseline severity and outcome evaluated using other measures was not evident

POOR + GOOD + EXCELLENT

 Duration of symptoms

63(42,16) → ([13, 16, 17, 18, 19, 22, 26, 31, 35, 36, 37, 41, 42, 43, 44, 45, 49, 56, 57, 58, 60, 63, 69, 72, 73, 75, 77, 78, 79, 83, 85, 86, 87, 90, 94, 95, 97, 99, 100, 101, 102, 103], [15, 23, 25, 32, 33, 38, 51, 53, 55, 62, 64, 68, 84, 89, 91, 98])

42(0,42) → (0, [13, 16, 17, 18, 19, 22, 26, 31, 35, 36, 37, 41, 42, 43, 44, 45, 49, 56, 57, 58, 60, 63, 69, 72, 73, 75, 77, 78, 79, 83, 85, 86, 87, 90, 94, 95, 97, 99, 100, 101, 102, 103])

5 → [48, 54, 59, 76, 93]

Nurick: 11

mJOA/JOA: 28

Other: 11

R = −0.47, p = 0.000039 [56]

R = −0.57, p = 0.0001 (sub-acute*) [58]

R = −0.82, p = 0.0001 (insidious onset*) [58]

R = −0.60, p < 0.01 (CSM*) [69]

R = −0.64, p < 0.01 (OPLL*) [69]

R = −0.597, p < 0.05 [35]

R = −0.539, p < 0.0001 [36]

R = −0.231, p = 0.05 [95]

R = −0.463, p = NA (OPLL*) [77]

R = −0.401, p = NA (CSM*) [77]

R = −0.459, p = 0.0039 [78]

R = −0.364, p = NA [83]

R = −0.401, p = 0.043 (eldery*) [59]

R = −0.225, p = 0.041 [59]

R = −0.53, p = 0.04 (diabetic*) [93]

Longer duration of symptoms preoperatively results in a worse surgical outcome on all scales

 Age

74(28,40) → ([16, 17, 19, 28, 29, 35, 38, 45, 48, 49, 52, 56, 60, 61, 65, 71, 72, 75, 79, 81, 86, 87, 88, 94, 95, 99, 102, 103], [14, 15, 18, 21, 22, 23, 24, 25, 33, 36, 39, 40, 42, 43, 51, 53, 54, 58, 59, 60, 62, 63, 66, 68, 70, 73, 77, 78, 80, 83, 84, 85, 89, 90, 92, 96, 97, 98, 100, 101]

28(1,27) → ([81], [16, 17, 19, 28, 29, 35, 38, 45, 48, 49, 52, 56, 60, 61, 65, 71, 72, 75, 79, 86, 87, 88, 94, 95, 99, 102, 103])

6 → [57, 69, 74, 76, 91, 93]

Nurick: 4

mJOA/JOA: 25

Other: 7

R = −0.28, p = 0.000046 [56]

R = −0.443, p = 0.005 [71]

R = −0.384, p = 0.0031 [95]

R = 0.23, p = 0.047 [81]

R = −0.52, p < 0.01 (Nurick**) [91]

R = −0.43, p < 0.01 (mJOA**) [91]

R = −0.65, p = 0.01 [93]

R = −0.37, p < 0.05 (OPLL*) [69]

R = −0.46, p < 0.05 (CDH*) [69]

Age is not a significant predictor of outcome on any scale

 Baseline severity score

56(35,17) → ([18, 20, 22, 24, 26, 28, 29, 31, 32, 36, 40, 45, 46, 50, 52, 54, 55, 62, 63, 64, 65, 67, 68, 80, 81, 82, 86, 89, 94, 95, 97, 98, 101, 102, 103], [14, 21, 34, 35, 43, 51, 56, 58, 71, 72, 77, 78, 79, 83, 84, 85, 90])

35(29, 6) → ([18, 20, 24, 26, 28, 29, 31, 32, 36, 40, 45, 46, 50, 52, 54, 55, 63, 65, 67, 68, 82, 86, 89, 94, 95, 98, 101, 102, 103], [22, 62, 64, 80, 81, 97]

4 → [53, 57, 59, 76]

Nurick: 8

mJOA/JOA: 25

Other: 8

R = −0.65, p < 0.001 (MDI**) [81]

R = 0.45, p < 0.0001 [20]

R = 0.64, p < 0.001 [55]

R = 0.37, p = 0.018 [67]

R = 0.69, p = 0.009 [68]

R = 0.38, p = 0.0027 [36]

R = 0.61, p < 0.0001 [95]

R = 0.84, p = NA (CSM*) [82]

R = 0.93, p = NA (OPLL*) [82]

R = 0.39, p = 0.015 (younger*) [59]

R = 0.33, p = 0.009 [59]

More severe preoperative myelopathy is associated with a worse outcome on mJOA/JOA scale. Baseline severity is significantly related to outcome on the Nurick, but the direction of this association is unclear

An R value between −1.0 and −0.5 or 0.5 and 1.0 was classified as a strong correlation, between −0.5 and −0.3 or 0.3 and 0.5 as moderate and between −0.3 and −0.1 or 0.1 and 0.3 as weak

mJOA/JOA (modified) Japanese Orthopaedic Association Scale, CSM cervical spondylotic myelopathy, OPLL ossification of the posterior longitudinal ligament, NA not applicable/not given, CDH cervical disc herniation, MDI Myelopathy Disability Index

*R value and p values reported for specific patient groups

**R value and p value reported for a specific outcome measure when more than one scale was used to evaluate outcome

Baseline severity score

Nine articles reported on the relationship between baseline severity score and surgical outcome. One article suggested a negative, seven a positive and one a non-significant association. All studies (7) that used JOA as the primary outcome measure demonstrated that more severe preoperative myelopathy is predictive of a worse outcome. When assessing this association on the Nurick scale, there was one article reporting a positive, one a negative and one an insignificant relationship, making it difficult to draw a conclusion as to the predictive value of preoperative severity on Nurick. One study recorded a strong R value (0.61) for this positive association [95] (Table 3).

Age

All 16 articles explored the importance of age on surgical outcome. Six studies found a negative, eight a non-significant and two a conditional relationship. Breaking it down by scale, two articles reported a negative and two a non-association between age and Nurick. Four and six studies found a negative and a non-relationship, respectively, between age and JOA/mJOA. It is unclear as to the association between age and outcome evaluated by Nurick, but it is possible to suggest that age may not be predictive of outcome on the JOA/mJOA scale. The two conditional studies were not included in this count. Furlan et al. [91] found that age was a significant predictor of outcome on both scales using multiple regression, but not after dichotomizing the mJOA outcome. In addition, Kim et al. [93] suggested that age was an important predictor, but only in patients with diabetes. The R values for this negative relationship ranged from moderate to strong (Table 3).

Most of the GOOD articles were not rated excellent due to flaws in their statistical analysis such as a lack of control for confounding variables. In contrast to the EXCELLENT studies, the GOOD studies used a wider variety of scales and measures to assess outcome such as the Cooper, neurosurgical cervical spine scale (NCSS), neurological assessments, questionnaires and evaluation of symptom improvement.

Duration of symptoms

Thirty-nine articles investigated duration of symptoms as a potential predictor of outcome. Twenty-five reported a negative, ten a non-significant and four a conditional relationship. It is evident that there is a significant negative association between duration of symptoms and outcome evaluated on both the JOA/mJOA scale and other measures: 22 versus 8 articles identified a negative versus a non-significant relationship. Using the Nurick scale, on the other hand, the results were inconclusive: four articles reported a negative and four a non-association. Inclusion of the conditional articles did not alter these results. The R values for this negative relationship ranged from weak to strong (Table 3).

Baseline severity score

Thirty-eight studies assessed baseline severity score as a potential predictor of surgical outcome. Five reported a negative, 17 a positive, 12 a non-significant and 4 a conditional relationship. It is evident that outcome, evaluated by mJOA/JOA, is positively dependent on the baseline severity score: 15 papers suggested a positive association, while only 8 reported a non-significant relationship. It is hard to define the relationship between baseline score and Nurick score as two versus four papers reported negative versus positive associations. It is clear that baseline score is a significant predictor of Nurick, but the direction of the relationship is unclear. With respect to all the other outcome measures, two and three studies suggested a negative and a non-significant relationship, respectively. The R-values of this association ranged from moderate to strong (Table 3).

Age

Fifty articles reported on age as a predictor of outcome. Sixteen identified a negative, one a positive, 27 a non-significant and 6 a conditional association between age and outcome. Age was not found to be a predictor of outcome, assessed using either the Nurick, mJOA/JOA or other measures. Two and 14 papers found age had a negative association with Nurick and JOA/mJOA, respectively. Five and 18 studies, on the other hand, reported no relationship with Nurick or JOA/mJOA, respectively. It is important to incorporate the conditional studies into this analysis, especially those that used JOA/mJOA as the primary outcome measure. Both Nagashima et al. and Ogawa et al. [74, 76] identified age as a significant predictor of outcome in more severe myelopathy groups, but not in moderate severity (10–12) groups. Furlan et al. [91] identified age as an important negative predictor using multiple regression, but not stepwise logistic regression. Finally, Koyanagi et al. [69] suggested that age was a significant predictor in patients with OPLL and CDH, but not CSM. Incorporating these results into our assessment of age as a predictor, we still conclude that it is an insignificant predictor. The R values of the significant associations ranged from weak to strong (Table 3).

The POOR studies had significant flaws, including study design and poor statistical power and control. In addition, many of these studies used unreliable outcome measures to evaluate surgical improvement and suffered on their ratings as a result.

Duration of symptoms

Sixty-three articles explored duration of symptoms as a predictor of surgical outcome. Forty-two reported a negative, 16 a non-significant and 5 a conditional relationship. The results were clear for all outcome measures: a longer duration of symptoms was predictive of a worse outcome. The R value of this association was reported in six studies and ranged from weak to strong (Table 3).

Baseline severity score

Fifty-six papers assessed baseline severity score as a predictor of outcome. Twenty-nine reported a positive, 17 a non-significant, 6 a negative and 4 a conditional association. Baseline severity score was a definite positive predictor of outcome assessed using the JOA/mJOA and other measures. Twenty-two versus ten papers reported a positive versus a non-significant association. The relationship between baseline severity score and Nurick was inconclusive: three papers identified as a negative, four a positive and two a non-significant association. As in the good/excellent analysis, it is evident that preoperative severity is related to Nurick score, but the direction of this association is unclear. The R value of this positive association was reported in six studies and ranged from moderate to strong (Table 3).

Age

Seventy-four studies commented on age as a potential predictor. Twenty-seven identified a negative, 1 a positive, 40 a non-significant and 6 a conditional association. Age was not a significant predictor of outcome, assessed using either the Nurick score or other measures. Nine papers reported a non-significant relationship between age and Nurick, whereas only four suggested a negative relationship. Without looking at the conditional associations, JOA/mJOA was also not dependent on age as indicated by 25 articles reporting no relationship and 20 suggesting a negative one. Five articles identified a conditional association between age and mJOA/JOA. The results from these studies do not affect these conclusions. The R values from studies that reported an association ranged from weak to moderate (Table 3).

Other predictors

Articles included in this review also explored the predictive value of other factors including gender, signs and symptoms, disease progression pattern and various comorbidities. The results from these studies are displayed in Table 4. There is no sufficient evidence in the literature to conclude that the presence of a particular sign or symptom or co-morbidity is predictive of outcome.
Table 4

Other prognostic indicators of surgical outcome reported either by EXCELLENT, GOOD or POOR studies

Co-morbidities

 Number of co-morbid diseases

Furlan et al. [91]: number of comorbidities was not associated with postoperative Nurick or mJOA using stepwise logistic regression

King et al. [94]: a greater number of comorbid diseases resulted in a worse outcome assessed by the Cooper scale

Nagata et al. [75]: a greater number of other health issues contributed to a poor surgical outcome in the older group

 Presence of co-morbid disease

Houten and Cooper [25]: presence of comorbidities is not related to outcome

 Diabetes

Chen et al., Kawaguchi et al. [30, 51]: diabetes is not a significant predictor of outcome

Kim et al., Choi et al. [53, 93]: diabetes is related to a worse outcome

 Psychological disorders

Kumar et al. [32]: patients in the poor outcome group had greater emotional problems than those in the good outcome group

 Smoking

Kim et al. [93]: smoking status did not affect outcome in control group. In diabetes group, smoking increased the risk of an unfavorable outcome

Signs and symptoms

 Lower extremity dysfunction

Lee et al. [33]: not a significant predictor of outcome

Gregorius et al. [23]: presence of lower extremity weakness is associated with a worse outcome

 Upper extremity dysfunction

Lee et al. [33]: not a significant predictor of outcome

Magnaes and Hauge [34]: presence of arm symptoms is positively associated with leg outcome

 Bowel/bladder dysfunction

Houten and Cooper, Lee et al. [25, 33]: not a significant predictor of outcome

Gregorius et al., Sinha and Jagetia [23, 45]: presence of bladder/bowel dysfunction is associated with a worse outcome

 Babinski sign

King et al. [94]: presence of a Babinski sign was associated with a worse outcome

Zhang et al. [102] presence of a Babinski sign was related with a better outcome

Alafifi et al. [50]: a positive Babinski sign was predictive of a worse outcome in patients with either a N/Hi or Lo/Hi MRI

 Leg spasticity/spastic gait

Gregorius et al. [23]: not a significant predictor of outcome

Alafifi et al., Bertalanffy and Eggert [17, 50]: presence of leg spasticity is associated with a worse outcome

Chiles et al. [20]: presence of a spastic gait was predictive of a poor outcome using the Wilcoxon rank sum test but not multivariate analysis

Hyperreflexia hand atrophy

Alafifi et al. [50]: both these signs were predictive of a worse outcome in patients with either a N/Hi or Lo/Hi MRI

Chiles et al. [20]: hand atrophy was associated with a poorer outcome using multivariate analysis

 Sexual dysfunction

 Lower extremity numbness

 Hand clumsiness

 Clonus

 Gait impairment

Found to be negative predictors of outcome by single studies [45, 46, 50, 94]

 Upper extremity atrophy

 Radicular pain

 Lower cervical pain

 Cervical ROM

 Long tract signs

Found to be insignificant predictors of outcome by single studies [23, 33]

Other

 Gender

17 articles reported that gender is not a significant predictor of outcome [19, 22, 23, 24, 29, 33, 40, 51, 52, 53, 54, 58, 59, 71, 81, 85, 91]

Emery et al. [55]: males showed greater improvement following surgery than females

 Race

Race is not a predictor of outcome [23]

 Onset of symptoms

Patients with a gradual onset of symptoms have a worse outcome [17]

 Disease progression

Patients with a slower progressing disease have a better outcome [37]

mJOA modified Japanese Orthopaedic scale, N/Hi normal/high, Lo/Hi low/high, MRI magnetic resonance imaging

Discussion

This review compared the results from the EXCELLENT, GOOD + EXCELLENT and POOR + GOOD + EXCELLENT papers (Table 5).
Table 5

Summary of results: percentages of articles reporting a negative, positive, non-significant or conditional association between surgical outcome and duration of symptoms, baseline severity score or age

 

Negative

Positive

Non-significant

Conditional

Excellent

 Duration of symptoms

69 %

NA

23 %

8 %

 Baseline severity score

11 %

78 %

11 %

NA

 Age

37.5 %

NA

50 %

12.5 %

Good + excellent

 Duration of symptoms

65 %

NA

26 %

10 %

 Baseline Severity Score

13 %

45 %

31.5 %

10.5 %

 Age

32 %

2 %

54 %

12 %

Poor + good + excellent

 Duration of symptoms

67 %

NA

25 %

8 %

 Baseline severity score

11 %

52 %

30 %

7 %

 Age

36.5 %

1.5 %

54 %

8 %

Bold values indicate the relationship between predictor and outcome with the highest percentage

One of the major findings of this review was that patients with a longer duration of symptoms and a more severe baseline score are more likely to have an unfavorable surgical result. The rationale behind this finding is that both severe and chronic, longstanding compression of the spinal cord may lead to irreversible damage due to demyelination and necrosis of the gray matter. Secondly, controversy exists in the literature as to the significance, strength and direction of the relationship between surgical outcome and age. Age was a non-significant predictor on all the scales when looking at the GOOD + EXCELLENT and the POOR + GOOD + EXCELLENT studies. When looking at only the higher quality studies (modified SIGN ≥ 10), however, age went from a non-significant predictor to a potential predictor. Although most surgeons will not discriminate on the basis of age, they should be aware that the elderly are not able to translate neurological recovery to functional improvement as well as the younger population. Potential explanations for this discrepancy include: (1) the elderly experience age related changes in their spinal cord including a decrease in γ-motoneurons, number of anterior horn cells and number of myelinated fibers in the corticospinal tracts and posterior funiculus, (2) older patients are more likely to have unassociated comorbidities that may affect outcome or (3) the elderly may not be able to conduct all activities on a certain functional scale due to these comorbidities (e.g. walking time may be affected by osteoarthritis) [35, 75, 88, 92, 93]. Finally, our review determined that factors such as signs (hyperreflexia, leg spasticity and Babinski sign), symptoms (gait impairment, clumsy hands and numbness), comorbidities (diabetes and psychological issues), and smoking status do carry some predictive value. Physicians should progressively incorporate predictive modeling into their practices to provide valuable prognostic information to their patients and direct appropriate treatment programs. When evaluating a CSM patient’s likely surgical outcome, the surgeon must weigh his/her preoperative severity, duration of symptoms and age accordingly while keeping in mind the ability of other factors to affect the outcome.

As shown in this review, results may differ depending on what scale is used to evaluate surgical outcome. This may be due to limitations in the scales rather than an indication of the actual association between the predictor and outcome. The Nurick score is a scale with lower sensitivity, it is graded out of five and is largely weighted towards lower limb function [104]. When outcome was assessed using the Nurick score, its association with various predictors was less conclusive. For example, duration of symptoms was significantly associated with Nurick score when looking at the EXCELLENT and the POOR + GOOD + EXCELLENT group, but was a questionable predictor in the GOOD + EXCELLENT group. In addition, in the GOOD + EXCELLENT and POOR + GOOD + EXCELLENT studies, there was a significant relationship between preoperative condition and Nurick, but the direction of the association was not evident. The articles that identified a negative association, however, had more biased samples: Gok et al., Huang et al. and Rajshekhar and Kumar [22, 62, 97] all had stricter inclusion criteria. On the other hand, the results were more definite when the outcome was evaluated on either the JOA or mJOA score: a longer duration of symptoms and a more severe baseline severity score were associated with a worse outcome. The mJOA and JOA are widely accepted standards for CSM assessment and separately evaluate lower and upper limb, sphincter and sensory function. Although JOA has been validated and shown to have high inter- and intra-rater reliability [105], its modified version has not.

In a research setting, when looking at a relationship between various factors and outcome, it is important to control for the confounders baseline severity score and duration of symptoms. When assessing statistical control in our review to rate the articles, we ensured that the studies controlled for age, duration of symptoms and baseline severity as these were identified as important predictors by Holly et al. [5]. According to this review, age may be a less important confounder. Few articles reported on the R values for the significant associations between various clinical factors and outcome. This makes it difficult for clinicians and researchers to evaluate the strength of these correlations.

Holly et al. [5] indicated that the limitations of their review were that there were very few prospective studies, that many studies assessed the outcome using un-validated measures and that it was hard to analyze functional outcome due to the use of different scales between studies. Our study had much larger pool of articles and consisted of higher quality literature, including some prospective studies that evaluated outcome using the validated JOA scale or Nurick score. There was also a sufficient number of articles to compare predictors on the same scale. In addition, the differences in our methodology, including a comparison of results among the three groups, also allowed for the incorporation of quality assessment in the analysis. Since, the Japanese have had a substantial contribution to research in the field of spinal cord injury, including the creation of the JOA scale, we also translated all Japanese articles into English and incorporated them into our analysis. Finally, our systematic review differed from Holly et al.’s as it included a preliminary analysis of other predictors including signs and symptoms, comorbidities, gender and smoking status to determine their predictive value.

There are limitations to our study: (1) we did not separate studies based on length of follow-up time; (2) articles that dichotomized a predictor might have done it differently (e.g. age) and (3) some of the articles with relevant abstracts or titles were excluded because they were not available or in another language other than Japanese or English. Future systematic reviews should address these limitations to provide a completely unbiased evaluation of important predictors of outcome.

The results from this review should encourage further exploration in this area. Even though many studies have examined important predictors of surgical outcome in CSM, there still remains a lack of evidence in the form of high quality, prospective studies using validated outcome measures. A large prospective analysis is required to reemphasize the predictive value of duration of symptoms and baseline severity score, to settle the controversy surrounding age and to confirm that signs, symptoms and comorbidities do impact surgical results.

Conflict of interest

None.

Copyright information

© The Author(s) 2013

Open AccessThis article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.

Authors and Affiliations

  • Lindsay A. Tetreault
    • 1
  • Alina Karpova
    • 1
  • Michael G. Fehlings
    • 1
  1. 1.Krembil Neuroscience Center, Toronto Western HospitalUniversity of TorontoTorontoCanada