What is success of treatment? Expected outcome scores in cervical radiculopathy patients were much higher than the previously reported cut-off values for success

Treatment success can be defined by asking a patient how they perceive their condition compared to prior to treatment, but it can also be defined by establishing success criteria in advance. We evaluated treatment outcome expectations in patients undergoing surgery or non-operative treatment for cervical radiculopathy. The first 100 consecutive patients from an ongoing randomized controlled trial (NCT03674619) comparing the effectiveness of surgical and nonsurgical treatment for cervical radiculopathy were included. Patient-reported outcome measures and expected outcome and improvement were obtained before treatment. We compared these with previously published cut-off values for success. Arm pain, neck pain and headache were measured by a numeric rating scale. Neck disability index (NDI) was used to record pain-related disability. We applied Wilcoxon signed-rank test to compare the expected outcome scores for the two treatments. Patients reported mean NDI of 42.2 (95% CI 39.6–44.7) at baseline. The expected mean NDI one year after the treatment was 4 (95% CI 3.0–5.1). The expected mean reduction in NDI was 38.3 (95% CI 35.8–40.8). Calculated as a percentage change score, the patients expected a mean reduction of 91.2% (95% CI 89.2–93.2). Patient expectations were higher regarding surgical treatment for arm pain, neck pain and working ability, P < 0.001, but not for headache. The expected improvement after treatment of cervical radiculopathy was much higher than the previously reported cut-off values for success. Patients with cervical radiculopathy had higher expectations to surgical treatment.


Introduction
Neck pain is among the leading causes of years lived with disability worldwide [1]. Cervical radiculopathy usually involves both neck pain and arm pain. Two systematic reviews have found no clear benefits of surgery over nonsurgical treatments [2,3]. Actually, one review indicated that cervical radiculopathy is a self-limiting condition in most cases [4].
Patient expectations might be important for post-treatment outcomes and satisfaction [5][6][7]. Satisfaction is not necessarily equivalent to fulfilled expectations [8]. However, having one's expectations fulfilled alone was the most significant predictor of a good outcome for patients undergoing lumbar decompression surgery [9].
In assessing the success rate after treatment, the minimal clinically important difference (MCID) is generally used 1 3 as a guideline. The estimate of MCID focuses on patient's perception of benefit alone. A single question is used to rate the perception of change after treatment. The validity of the anchor is rarely questioned despite that it represents the retrospective perception of global change, often ranging from completely recovered to worse. This method has been criticized as being "tautological" because one subjective measure is validated by another subjective measure [10,11]. MCID was originally defined as "the smallest difference in score in domain of interest which patients perceive as beneficial and which would mandate, in the absence of troublesome side effects and excessive cost, a change in patient´s management" [12]. The wording was later revised and replaced by minimal important difference (MID) [13,14]. Later research has focused on the difference between those who rate themselves as slightly improved compared to unchanged. The benefit, side effects, costs and change in health care utilization, or continued drug consumption that can be measured objectively are not incorporated in the anchor [15]. In an editorial from 2010, Eugene Carragee predicted that the anchor-based method would be a historical oddity, like a success rate based on the surgeon's impression of his own good work [10]. In contrast, the literature in this field reflects an opposite trend.
The concept of success has been aligned to an improvement that reflects a substantial amount of change rather than minimal change [16,17]. Success may simply be defined as the achievement of a desired result, meeting a defined range of expectations [18,19]. A recent study, based on the data from the Norwegian Registry for Spine Surgery (NORspine), found 13.5 points change on NDI as a criteria for success after surgery for cervical radiculopathy [20]. The criteria for success were obtained as the patient perceived benefit of an operation by asking the patient at follow-up using global perceived effect scale (GPE) as an anchor [20]. Similarly, the benefit of the treatment can be explored by defining the criteria for success through the improvement a patient expects apriori.
The improvement a patient expects in order to undergo surgery or non-operative treatment for cervical radiculopathy has to our knowledge not been examined. Therefore, we wanted to evaluate treatment outcome expectations by asking the patients to fill in their expected improvement at baseline and compare these with previously published cutoff values for success.

Study population
This is a cross-sectional study. This report is consistent with the Strengthening The Reporting of Observational Studies in Epidemiology (STROBE) statement [21]. The first 100 consecutive patients from an ongoing randomized controlled trial comparing the effectiveness of surgical and nonsurgical treatment for cervical radiculopathy were included [22]. The aim of this study was to compare treatment outcome expectations with established cut-off values for success. We did not compare the treatment outcome expectations with the actual outcome at one year because reported expectations were derived from the ongoing clinical trial. The study participants were outpatient clinic patients referred to Oslo University Hospital for treatment of cervical radiculopathy consistent with disc herniation or spondylosis at levels C5/ C6 and/or C6/C7 between October 2018 and August 2020. Demographics and patient-reported outcome measures (PROMs) were obtained at the baseline before treatment and before patients were allocated to either surgery or conservative treatment. All scores were filled in by the patient with no involvement of the authors. Informed consent was obtained from all participants. The trial was approved by the Norwegian ethics committee, REK 2017/2125, and registered at www. clini caltr ials. gov-as NCT03674619-on September 17, 2018.
Inclusion criteria • Aged 20-65 years • Study 1 Neck and arm pain for at least 3 months, and a corresponding herniation involving one cervical nerve root (C6 or C7) Study 2 Neck and arm pain for at least 3 months, with corresponding spondylosis involving C6 and/or C7 • Arm pain intensity of at least 4 on a scale from 0 (no pain) to 10 (worst possible pain) • Willing to accept randomisation to either of the treatment alternatives • Neck disability index (NDI) > 30%

Exclusion criteria
Patients with any previous cervical fractures or cervical spine surgery; signs of myelopathy; rapidly progressive paresis or paresis < grade 4; pregnancy; arthritis involving the cervical spine; infection or active cancer; generalized pain syndrome; serious psychiatric or somatic disease that excludes one of the treatment alternatives; concomitant shoulder disorders that may interfere with the outcome; abuse of medication/narcotics; inability to understand written Norwegian; and unwillingness to accept one of the treatment alternatives.

Outcome measures
The Neck Disability Index consists of ten questions about pain-related disability, including items such as headaches, concentration problems, reading issues and sleep disturbances. Each item is rated on a 6-point scale from 0 (no disability) to 5 (full disability). The numeric response for each item is summed for a score varying from 0 to 50 and then transformed into a total score ranging from 0 to 100% (higher scores representing more severe symptoms). The Norwegian version has been validated in patients with neck pain and with cervical radiculopathy [23,24]. Arm pain, measured by a numeric rating scale (NRS) from 0 (no pain) to 10 (worst imaginable pain) [25].
Patient expectations ahead of treatment. The patients were asked to fill out the Neck Disability Index-imagining that they were at one year post-treatment-and to select the lowest category (poorest result) they would be content with for each item. This is the written instruction that was given: "We will now ask about your expectations. Imagine that it's been one year since you were included in the study and that you are satisfied. Please tick off one box for each of the following ten questions. For each question choose the one category that you can be satisfied with." The patients were also asked to answer a Global Score about what they expect their symptoms to be like one year post-treatment (ranging from much worse to much better) [26]. This was registered separately for arm pain, neck pain, headaches and ability to work. The seven response options were much worse, worse, slightly worse, unchanged, slightly better, better, and much better. We asked the patients to report what they would expect in case they received the surgical versus the nonsurgical treatment.
Emotional distress assessed by the 10-question version of the Hopkins Symptom Check List (HSCL-10) [30].

Statistical analysis
Continuous data are presented as means with standard deviation (SD) and 95% confidence intervals (CI), or median with interquartile range (IQR). Categorical data are presented as numbers (n) and percentages (%). We applied Wilcoxon signed-rank test to compare the expected outcome scores for the two treatments. Expected NDI percentage change was calculated by subtracting expected NDI score from baseline NDI score divided by baseline NDI score and multiplied by 100. All the statistical analyses were performed using the SPSS (version 26) statistical package. A P value < 0.01 was set as the level of statistical significance.
Out of the 100 patients included in this study, one failed to complete the expected NDI and was excluded from the analyses. Another nine patients misinterpreted the expected NDI questionnaire. They thought they were asked about the symptoms at the present time point rather than symptoms one year ahead in time. These patients scored almost identically for NDI and for expected NDI in one year. We crosschecked their expectations for arm pain, neck pain, headaches and working ability, and they expected almost no pain and much better working ability one year ahead in time. After a discussion within the study group, these patients were excluded from the analyses.

Results
The study population comprised of 53 females and 47 males aged from 29 to 63 years. 20% were daily smokers, 69% were married or cohabitants, 58% used painkillers daily, 44% were sick-listed 50% or more. Median duration of arm pain was 7.7 months (Table 1).
The expected mean NDI one year after the treatment was 4 (95% CI 3.0-5.1). The expected mean reduction in NDI was 38.3 (95% CI 35.8-40.8). Calculated as a percentage change score, the patients expected a mean reduction of 91.2% (95% CI 89.2-93.2) ( Table 2). Patient expectations were higher for surgical treatment for arm pain, neck pain and working ability, P < 0.001 (Fig. 1). For expected headache in one year, there was no significant difference in expected scores, P = 0.537.

Discussion
Neck Disability Index (NDI) is a frequently used patientreported outcome measure in cervical radiculopathy. The score ranges from 0 to 100, and the higher the score, the worse the pain and disability. A systematic review reported minimal important difference (MID) to vary from 10 to 38 [31]. A cohort study of patients who underwent a cervical fusion for degenerative spine conditions calculated MID of 15 and substantial clinical benefit (SCB) of 20 [15]. A difference of 10 is most commonly used for sample size calculation in trials comparing various surgical procedures for cervical radiculopathy [32].
Compared to the previously reported criteria for success for Norwegian patients undergoing anterior cervical decompression and fusion for cervical radiculopathy, our patients have much greater expectations for success [20]. Figure 2 shows that almost all of our patients, in order to be satisfied, expect a higher improvement than the proposed criteria of 13.5 NDI points.  Figure 3 demarcates the gap between the expected improvement observed and the previously reported criteria for success that were described as of particular importance in distinguishing between a successful outcome or not. NDI percentage change score of 35% or more was thought to be a highly sensitive cut-off value for success in the study by Mjåset et al., but in our study, the patients expect that NDI improves 91% after treatment for cervical radiculopathy [20].
Compared to NDI percentage change in other interventional studies, the expected improvement remains relatively high. Reported mean NDI after treatment varies between 21 and 28, representing a percentage change of 30-60%, depending on the baseline score [32][33][34][35].
We estimated the patients' expectation of an outcome. By doing this, we simulated scoring the desired result. When the expectation score is achieved after treatment, the aim is fulfilled and outcome may be regarded as a success. By using this method, the lower level of the 95% confidence interval for the expected outcome was 35.8 on the NDI as compared with 13.5 in a recent study on criteria for success after surgery for cervical radiculopathy [20]. The corresponding lower level of percentage improvement was 89.2% compared with 35% [20]. This observed major discrepancy questions the use of MID as success criteria for spine surgery [36,37]. In a more recent systematic review, Copay et al. consider MID values that are lower than the measurement error or minimal detectable difference (MDC) as problematic and debates the concept of MID [38,39]. We agree that it is  [20] questionable to report that a reduction from 10/10 to 8/10 on VAS is a success, even to 7/10 which is more according to a 30% improvement suggested in many studies [37]. The interpretation has many aspects because the scales are statistically treated as linear, while an improvement from 5/10 to 3/10 is likely to be different from an improvement from 10/10 to 8/10. More importantly, we must take into consideration side effects, costs and change in health care utilization and continued drug consumption when discussing whether a treatment was successful.
MID is defined and estimated in a variety of ways in the literature, and there is a lack of formal agreement on which methods that are superior [38,39]. The validity of using a subjective anchor to estimate success after treatment may be questioned because of bias and estimate error of the anchor. Likewise, we assume that it is difficult for the patient to score their expected NDI one year post treatment. Nevertheless, both ways of estimating success may be within the definition of the term. The measurement error is the minimal difference that reliably can be estimated in an individual patient. When MID or success rate is used in a shared decision with the patient before treatment, we also have to consider the individual measurement error of an outcome in order to inform the patient about the expected outcome.

Limitations and strengths of the study
The main limitation of this study is the use of a non-validated method to explore the patient expectations. Although the NDI itself is validated in Norwegian patients with cervical radiculopathy, the question about expectations to treatment is not validated. We did not change the NDI in any way. In this study, we emphasised that the following questions were related to patient expectations and not the present situation. This proved to be insufficient in 9% of the participants, as they obviously did not pay attention to the introduction and answered the questions in the same matter they did with the baseline NDI. Asking patients about their expectations to treatment is not straight forward. There is always uncertainty related to how an individual interprets the question. Are they telling us about their realistic expectations, minimal expectations or perhaps their hopes? Even though all patients were provided with balanced information prior to inclusion and randomization in the study, these are surgical patients referred to neck surgery. Patients were informed that the two treatments are both very good and probably equally effective. One cannot exclude that this led to higher expectations regardless of treatment. Regarding the outcome after surgery, we applied the results from a recent multicenter RCT where our hospital contributed with most patients [32]. This trial compared the efficacy of an artificial disc with the traditional surgical method used in the present study. Furthermore, we applied the results from a systematic review concluding that there is sparse evidence to conclude that the efficacy of surgery is superior to conservative treatment [3]. By informing the patients in this way, we used the existing evidence but because the definition of success is debated, we might have been too optimistic in informing the patients. On the other hand, our information adds to all the other information the patients had received preceding inclusion in the present study. All scores were filled in by the patient with no involvement of the authors. Having in mind that having one's expectations fulfilled alone was the most significant predictor of a good outcome for patients undergoing lumbar decompression surgery [9], one must not underestimate the importance of patient expectations. It would be interesting to compare the expected outcome with actual outcome scores at one year. However, the data were derived from an ongoing RCT. The author of the current paper and the statistician are blinded to treatment allocation. We chose not to extract the primary outcome data until the end of trial. This is in line with good reporting practice and the CONSORT recommendations [40]. Patients in this study come from Health Region South-East in Norway covering a population of about 2.9 million inhabitants. That constitutes the majority of patients registered in the (NORspine), so these patients are in fact very similar to the ones used in the recent study on criteria for success after surgery for cervical radiculopathy [20].
We were surprised to find that so many patients expected complete or almost complete relief, but the instructions in filling out the questionnaire were very clear and we actually asked them to fill in the lowest value they would accept to be satisfied with the treatment outcome. A validation using qualitative methods interviewing a randomly selected group of patients would have been interesting but this is rarely conducted in studies using the retrospective anchor method and was considered to be out of the scope of the present study.
Some of the strengths of this study are that we had a relatively large and homogenous group of patients who had undergone thorough evaluation by a neurosurgeon and a specialist in physical medicine and rehabilitation. They reported a high level of pain and disability at baseline all measured by validated tools; NRS for arm pain, neck pain and headache, and NDI for pain-related disability.

Conclusions
The mean expectation of about 90% improvement on the primary outcome as observed in this study suggests that a level of 30% improvement is critically low as a definition of success. Future studies using both qualitative and quantitative methods are warranted to explore this field. This is important because readers, researchers and stakeholders often interpret the success as a substantial improvement as different from or at worst slightly above the measurement error.
Author contributions All authors listed have contributed sufficiently to the project to be included as authors, and all those who are qualified to be authors are listed in the author by-line. JIB had the original idea for this study and secured funding. The first draft of the manuscript was written by MT. All authors commented on previous versions of the manuscript. All authors have read and approved the final manuscript and stand by the integrity of the entire work.
Funding Open access funding provided by University of Oslo (incl Oslo University Hospital). The authors are grateful to the Southern and Eastern Norway Regional Health Authority for funding this study. The sponsor had no role in designing this trial, writing the manuscript, or in the collection, analysis, and interpretation of data.

Data availability
The datasets used and/or analysed during the current study are available from the corresponding author on reasonable request.

Conflict of interest
The authors have no relevant financial or non-financial interests to disclose.

Ethics approval
The trial was approved by the Norwegian ethics committee, REK 2017/2125.

Informed consent Informed consent was obtained from all participants.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creat iveco mmons. org/ licen ses/ by/4. 0/.