The results suggest that GA mainly measures and reflects the patient´s pain status, as it correlates well with the condition-specific VAS and ODI, as well as with pain-specific items within the quality-of-life PROMs. This was expected given that GA specifically asks about the change in perceived pain. This finding is coherent with the results of a recent study [14]. As also expected, the degree of association to EQ-5D and domains other than “Bodily Pain” within SF-36, was lower. The directions of the correlations were as postulated.
GA further appears to be an appropriate discriminator between a successful outcome and failure, since there were clear cut-off points in VAS and ODI between patients assessing themselves as “pain free or much better” and “somewhat better, unchanged or worse”. This was not seen with EQ-5D. Our results indicate that GA, used as a reference criterion (for instance, in the calculation of minimal important change thresholds) can increase the interpretability of pain-specific PROMs. It is debatable whether GA is applicable as an anchor for generic quality-of-life measures, since it specifically asks about pain. However, GABACK demonstrated similar correlation coefficients with EQ-5D to those for ODI for all three diagnosis groups. This illustrates the importance of pain as a determinant of quality of life in individuals with degenerative spinal disorders. To increase the accuracy and decrease the bias of a transition question, it is recommended that it should address a specific construct that is anchored to a specific point in time. We believe, that these criteria are fulfilled for GA [22].
Our study further suggests that the respondents—irrespective of diagnosis—base their answers primarily on their present symptom state, and to a lesser degree, reflect on their pain in a longitudinal way, which should be the case if GA worked as an ideal transition question. During the last decades, several reports have been published that support this finding [11, 13, 23]. Two theories that are frequently used to explain this phenomenon are recall bias, as described by Ross in his paper on implicit theory of change [24]; and response shift, defined by Sprangers and Schwartz [9]. Consequently, we may question the validity of this type of question. The latter form of bias, i.e., response shift, however, applies to all PROMs and one can argue that neither a PROM that measures two points in time with an interval of ≥ 1 year, nor a retrospective PROM is an appropriate outcome measure. Due to these limitations the concept of a patient acceptable symptom state, PASS [25] was introduced suggesting that a satisfactory outcome of surgery should not be expressed in terms of change but rather as an absolute score at follow-up above which the patients express an acceptable symptom state. Van Hooff et al. [26] proposed the threshold for such a state for ODI as being 22 in a population of lumbar spine surgery patients. GA could serve as the reference criterion for calculations of acceptable symptom states for VAS and ODI and other condition-specific PROMs. The corresponding thresholds for ODI and VASBACK using GABACK/SUCCESS as the reference for the PASS was 27 and 32.5, respectively, in the DDD group. For the whole population the PASS for ODI was also 27 and for VASBACK 31.5.
The discriminative ability of ODI and VASBACK with GABACK as the reference criterion for DDD, tested in ROC curve analyses, was above 90% (interpreted as “excellent”) for the post-operative scores and above 86% (“good”) for the Δ-scores. Compared to the Hägg et al. study [15] which demonstrated a sensitivity and specificity for the Δ-scores of 75%, the current real-life database analysis strengthened the usefulness of GA as a reference criterion.
There are several limitations to the present study. The percentage of responders 1 year postoperatively in Swespine, and, therefore, in our study, was 55–68%, which would be unacceptable in a randomized controlled study. In registry-based studies numerous non-responders is often the case. In the analyses where comparisons of PROMs are made, completion of all PROMs before as well as after surgery are required. As a consequence, there is a substantial decrease in eligible patients for these analyses and there is, therefore, a risk of selection bias. The results in the non-responder analysis, however, suggest this risk to be low. There is evidence that a high number of missing values do not distort the outcome data in a registry [27]. However, the implications of non-response bias need further investigation.
Were there a Gold standard for the assessment of effectiveness in spine surgery, then of course it would be better to correlate GA to this. The fact that outcome instruments, such as GA, are used as proxies for a Gold standard prevents the use of common psychometric methods for validity testing.
Overall the results indicate that GA, used as a reference criterion, can increase the interpretability of pain-specific PROMs. However, its applicability as an anchor for the quality-of-life PROMs is less obvious. The results on the one hand support the use of GA as an outcome measure of the present state of pain and physical function, while on the other hand demonstrate its limitations as a measure of change.
The findings in this study are exploratory and need to be confirmed in a study that examines potential confounders. Further studies that explore how different variables, such as gender, age, smoking habits, and social status, affect the response pattern of GA are needed. It has also not been explored how GA works in a non-surgically treated population. Since GA appears to have the same function as a PASS, a preliminary threshold for ODI of 27 was suggested, which is rather high in comparison with 22, as demonstrated in the van Hoof et al. study [26]. This difference needs further investigation.