Introduction

Low back-related leg pain or sciatica is one of the common variations of low back pain (LBP) [1, 2].

The literature suggests that the presence of sciatica is responsible for a poor prognosis in LBP patients [36]. Although definitions of sciatica used in epidemiological surveys and clinical practice vary, sciatic pain is generally defined as pain radiating to the leg, normally below the knee and into the foot and toes with varying neurological findings [7].

A recent review of sciatica prevalence studies reported a substantial variation in estimates ranging from 1.6 to 43 % [8]. The definition of sciatic symptoms seemed to explain most of the variation. This is the case also for defining back pain prevalence, leading to a recent consensus study towards standardisation of back pain definitions for use in prevalence studies so that heterogeneity in findings is minimised [9]. Dionne et al. [9] reported that in back pain research sciatica prevalence was important and suggested that in self-report studies ‘sciatica’ should be replaced by ‘pain that goes down the leg’. It is also suggested that ‘pain below the knee’ is a good proxy for clinically diagnosed sciatica and a number of studies using self-reported information have used ‘pain below the knee’ for defining sciatica [1014], although other studies have employed more stringent self-report definitions such as ‘pain radiating to the leg(s) that worsens with coughing or sneezing’ [15]. At present, it is not known whether self-reported symptoms of sciatica correlate with the clinical diagnosis or whether ‘pain below the knee’ or ‘pain that goes down the leg’ is a reasonable proxy for the presence of sciatic symptoms.

As sciatica is considered a poor prognostic indicator in back pain presentations and may also require a different therapeutic approach to simple back pain [16, 17], accurate definitions are important for estimates of prevalence and natural history and for evaluating treatment outcome according to presentation in epidemiological studies.

The purpose of this study was twofold: to assess the agreement between self-reported leg pain and clinically defined sciatica (nerve root involvement) and if necessary to identify and assess the accuracy of an optimum cluster of self-report items for the diagnosis of sciatica.

Methods

Subjects and design

Patients with LBP with or without leg pain participated in a randomised controlled trial (RCT) investigating effectiveness of physiotherapy back pain treatments in primary care. All the participants were referred by their GP. Out of 851 RCT participants, 511 reported low back with pain radiating to the leg(s) and had complete data on self-reported items and recorded diagnosis of their leg pain. These 511 participants formed the sample of this analysis. Details of the RCT protocol have been reported elsewhere [18].

Patient self-reported items

The participants completed a self-administered questionnaire at baseline on self-report measures of leg pain of spinal origin capturing area of pain, frequency and severity, effect of coughing or sneezing, description of pain quality and the presence or absence of numbness or tingling (Table 1). The questionnaire was compiled by identifying potential self-reported indicators of nerve root involvement from the literature [7, 19, 20].

Table 1 Self-reported items

Clinical examination

Within 10–15 min after completing the questionnaire, all the participants underwent a clinical examination by a physiotherapist. A total of 15 physiotherapists participated. The physiotherapists were blinded to the patient self-reported items. The assessment consisted of history taking in terms of pain distribution, quality of pain, easing and aggravating factors, sensory disturbances, frequency, severity and bothersomeness of leg pain. The physical examination consisted of lumbar mobility assessment, neurological testing (myotomes, reflexes, sensation) and neural tension tests (SLR, femoral stretch). All participating physiotherapists had undergone training in the assessment of back pain with leg pain. The physiotherapists were required to classify patients’ symptoms as: (a) nerve root involvement, (b) no nerve root involvement, or (c) possible nerve root involvement but not conclusive, according to their clinical judgment based on clinical findings.

Data analysis

The clinical diagnosis was considered as the reference standard. Two diagnostic classifications are considered, which differ in the categorisation of ‘possible nerve root involvement but not conclusive’ that is pooled with: (1) the ‘no nerve root involvement’ category—this classification (referred to as confirmatory) was specifically aimed to target absolute clinical cases only; (2) the ‘nerve root involvement’ category—this classification (referred to as indicative) aimed to target ‘possible’ as well as confirmed cases. Self-reported items were compared to the reference standard (clinical diagnosis) using sensitivity, specificity, predictive values, likelihood ratios (LRs) and the area under the receiver operating characteristic (ROC) curve (c statistic, AUC) which provides the average-weighted sensitivity/specificity across the range of scale values/categories. Point and 95 % confidence interval estimates were derived for each statistic. As well as carrying out univariable comparisons, binary logistic regression analyses were performed to evaluate the prognostic accuracy of multiple items most independently predictive of the diagnosis of clinically assessed nerve root involvement (or sciatica). The classification cut-off for a sciatica diagnosis was at the customary P > 0.5 for the primary analysis (though this was varied to assess the impact different cut-offs had on the discriminative ability of the multivariate model). Two types of multivariable model are presented: (1) approach 1 (full-entry model), based on the inclusion of all items observed; (2) approach 2, using manual forward selection regression (with entry of items restricted to the most significant independent variables) to identify the most efficient combinations of items for discriminating a clinical diagnosis of nerve root impingement. This latter approach consisted of different models built sequentially by adding variables one-by-one in order of predictive ability on multivariable testing.

Performance of the diagnostic model

Greater tool discrimination is reflected by: sensitivity, specificity, predictive value and AUC closer to 1; higher positive LR; lower negative LR. Likelihood ratios (LRs) from 2 to 5 represent ‘small’ increases in the post-test probability of disease, from 5 to 10 represent ‘moderate’ increases, and above 10 represent ‘large’ increases; correspondingly 0.2–0.5 reflect ‘small’ decreases in the post-test probability, 0.1–0.2 reflect ‘moderate’ decreases and <0.1 reflect ‘large’ decreases [21, 22]. Hosmer and Lemeshow [23] provided the following classification system for the AUC: 0.7 ≤ AUC < 0.8 = ‘Acceptable discrimination’; 0.8 ≤ AUC < 0.9 = ‘Excellent discrimination’; AUC ≥ 0.9 = ‘Outstanding discrimination’. From the logistic model the Nagelkerke R 2 denotes the proportion of variance explained by the model (values closer to 1 reflect a more valid tool).

Results

Demographic information on age and gender is presented in Table 2. On clinical examination, 37.0 % (189/511) of the patients reporting low back and leg pain were classified by the assessing physiotherapist as having nerve root pain. A further 17.0 % (87/511) were documented as ‘possible neural/inconclusive’. These numbers are the basis of the reference standard diagnostic comparisons for which the self-reported items were checked for diagnostic accuracy for sciatica.

Table 2 Demographics

Table 3 presents the cross-tabulated frequency data of individual self-report items versus clinical classification of nerve root involvement. Data showing the diagnostic accuracy of individual items is presented in Table 4. Sensitivity and specificity of the individual items were wide ranging—although average sensitivity/specificity (as designated by the AUC value) was above 0.6 for three items: ‘pain below knee’, ‘which pain worst’ and ‘numbness, pins and needles’ (the AUC for the other items being in the range of 0.5–0.6). Sensitivity was over 50 % for ‘pain below knee’ and ‘numbness, pins and needles’ items, and for certain cut-offs of ‘frequency of pain’, ‘severity of pain’ and ‘which pain worst’. Specificity above 50 % was observed for all items. In relation to classification 1,Footnote 1 positive predictive values were generally in the range of 0.4–0.5 whilst negative predictive values were 0.6–0.8 (i.e. NPVs being mostly higher). In contrast, in relation to classification 2,Footnote 2 positive predictive values were generally in the range of 0.55–0.75 whilst negative predictive values were 0.45–0.65 (i.e. PPVs being mostly higher). For most items the positive likelihood ratios exceeded 1 and negative likelihood ratios were less than 1; these being statistically significant in most cases (in relation to a null hypothesis of LR = 1). However, the likelihood ratios were small: positive likelihood ratios were mostly less than the floor marker of 2 denoting a ‘small’ LR+, and negative likelihood ratios were mostly greater than the 0.5 ceiling marker for a ‘small’ LR−.

Table 3 Descriptive cross-tabulated frequency data of individual self-report items versus clinical classification of nerve root involvement in patients with low back pain and leg pain
Table 4 Diagnostic accuracy of individual self-report items to identify nerve root involvement (based on clinical assessment) in patients with low back pain and leg pain

Associations, both univariable and multivariable, between the self-report items and clinical diagnosis of nerve root involvement are shown in Table 5. In univariable testing, all items except ‘toothache’ (and ‘shooting pain’ in the confirmatory diagnostic classification) were significantly associated with the clinical diagnosis. However, only three variables were significantly independent items (at the level of P < 0.05) in the full multivariable analysis—‘pain below the knee’, ‘which pain worst’ (leg pain only versus not leg pain only) and ‘numbness, pins and needles’ (the latter being significant in respect of testing against indicative diagnostic classification). The item ‘coughing/sneezing’ was associated with the confirmatory diagnostic classification at the level of 0.05 < P < 0.1, but was excluded from the multivariable forward-selection model as it was not strongly associated with both diagnostic classifications. Diagnostic statistics for four multivariable models are shown in Table 6. Included is the full-entry model (which included all items) and three manual forward-selection models; the first being based on ‘pain below the knee’ only; the second, additionally including ‘which pain worst’ (leg pain only versus not leg pain only), and the third including all three aforementioned plus ‘numbness, pins & needles’. For the full model, the selected categorisation cut-off for ‘frequency of pain’ was ‘nearly all the time’; ‘severity of pain’ was ‘7–10’, and ‘which pain worst’ was ‘leg pain’ [on the basis that these gave highest odds ratios in the tests of association (Table 5)]. The full-model yields an AUC of 0.74 against the confirmatory classification and 0.76 against the indicative classification, and explains 23 and 27 % of the variance respectively. These are only marginally better fits than the forward-selection models that include the three most prognostic items (as shown in Model 2(iii), Table 6). The models were more specific than sensitive in relation to the confirmatory diagnostic reference, but were similarly sensitive and specific in relation to the indicative reference. Differences could also be seen in relation to predictive values, where the models were more likely to yield higher NPVs than PPVs with respect to the confirmatory diagnostic classification, whereas PPVs and NPVs were similar with respect to the indicative reference. Revised returns for these diagnostic statistics may be achieved by adjusting the classification cut-off from 0.5. Examples are illustrated in the legend of Table 6, and demonstrate that the sensitivity is increased and specificity decreased when the cut-off is lowered (e.g. to 0.3), whereas the opposite applies when the cut-off is raised above 0.5 (e.g. as shown by the figures for a 0.7 cut-off).

Table 5 Odds ratios (95 % CI) for associations between single items (univariable models) and multi-items (multivariable model) with clinic screening ‘gold-standard’
Table 6 Discriminate statistics for combined self-report multi-item decision models against clinic screening ‘reference-standard’

Discussion

In this study we explored whether self-report of symptoms is an accurate way for defining sciatica cases. The results suggest that self-reported items alone are not sufficiently accurate in selecting subjects with sciatica in epidemiological studies.

Pain below knee was the best single item for diagnostic accuracy with an AUC of 0.67 which however is less than ‘acceptable discrimination’. In this cohort, the commonly used (or suggested) proxies of ‘pain radiating to the legs’ [9], or ‘below the knee’ [11, 13] overestimate the prevalence by 170 and 39 % respectively. In contrast, ‘pain radiating to the leg(s) that worsens with coughing and sneezing’ [15] underestimates the prevalence by 31 %. Sensitivity and specificity ranges were wide ranging across the individual self-report items, but the discriminative ability of the individual items were low as reflected by the positive and negative likelihood ratios falling mostly under 2 and above 0.5, respectively.

The individual single-items were not independent in their predictive capacity (Table 4). Though, a cluster of three self-reported items, including distribution of pain below the knee, leg pain that is worse and feeling of numbness or pins and needles in the leg, did improve discrimination to an ‘acceptable’ level with an AUC of 0.72 in respect of the confirmatory diagnostic reference and 0.74 in respect of the indicative reference. However, the likelihood ratios from the model [model 2(iii)] were indicative of a ‘small’ amount of discrimination. Approximately half of all clinically confirmed sciatica cases were misclassified as non-cases. As indicated by the higher NPV of 0.76 compared to the PPV of 0.59, a negative test result [according to the three-item model 2(iii)] was more likely to predict absence of nerve root involvement than a positive test result was to truly predict presence of nerve root involvement—in relation to clear cases of nerve root involvement (as based on the confirmatory classification). Sensitivity and specificity was similar at about 0.7 when the self-report models were tested against the less strict indicative/possible diagnostic criteria; the PPV and NPV values also providing similar probabilities of about 0.7 for correct predictions. However, irrespective of which diagnostic classification is of interest, the findings indicate that the probability of false test results would be high overall.

Vroomen et al. [24] previously reported on the contribution of history and examination items to the diagnosis of lumbar radiculopathy due to a disc prolapse and suggested that examination items contribute little after the establishment of the history items. History items are mainly self-reported but in the context of the clinical examination clarification can be obtained which most likely improves accuracy of reporting. In addition, this study’s [24] population was a highly selected cohort of patients, a factor that is likely to contribute to high diagnostic accuracy as patients tend to be at the worse end of the spectrum.

Our findings suggest that when the objective of a study is to capture the presence of back pain that spreads to the leg(s), as perhaps an index of severity, then proxies such as ‘pain down the leg’ or ‘pain below the knee’ may be acceptable but for specific presentations such as sciatica such proxies do not seem sufficiently accurate. Even with a cluster of positive symptoms [Models 1 and 2(iv)] the discrimination although acceptable may still be considered problematic as indicated by the low sensitivity estimates.

A number of points pertaining to strengths and limitations of this study merit some further discussion. Firstly, the self-reported items selected as potential indicators of sciatica. We believe that these are in accordance with current literature as potentially contributing to the clinical diagnosis or impression of the presence or absence of sciatica.

Second point is the acceptance of the clinical judgement by the assessing physiotherapists as the ‘reference standard’. The absence of a ‘gold reference standard’ in sciatica is well documented in the relevant literature [7, 25] and clinical diagnosis does serve as the ‘reference standard’ in a number of studies [25]. Taking into account that patients with positive imaging findings of nerve root involvement may be asymptomatic and vice versa and that in primary care, at least initially, diagnosis is based on clinical assessment alone, it is reasonable to use clinical diagnosis as the ‘reference standard’ in primary care as opposed to imaging tests. However, misdiagnosis of cases is possible.

A third point pertains to the assessors in this study. A large number of assessors participated and although this may introduce variability it also contributes to the generalisability of results. Nevertheless, all assessors underwent training and all clinical assessment information was collected in a standardised manner. All assessors were physiotherapists and one may argue that medically trained clinicians (general practitioners for example) may be better in diagnosing the presence or absence of nerve root involvement in patients with low back and leg pain, and therefore diagnostic accuracy of items could vary if diagnosis varied. However, to our knowledge, there is no evidence in the published literature to suggest that there is a difference between different health care professionals although there may be differences depending on level of clinical experience.

Fourth point is the population studied and the method. This was a truly primary care population presenting with variable severity and duration of symptoms. There was no selection bias towards the worse cases. Patients were asked on the day to answer the set of questions literature suggests be included in the assessment of symptoms of LBP with leg pain to assess probability of nerve root involvement. The patients were asked to think about their symptoms and symptom behaviour within the last week. There is a possibility that with asking about symptoms within the last week some patients, although having had these symptoms to a greater or a lesser degree, may have substantially recovered on the day of assessment and therefore findings of clinical history and examination were negative leading to decreased discrimination values. We do not know if results would be different in a secondary care population for example in which case one expects more severe symptoms which may be easier to recognise. This though contributes to the problem of selection bias.

Conclusion

Low back pain with leg pain is a common presentation, and in a number of cases the presence of leg pain is due to a spinal nerve root involvement causing radiculopathy (sciatica). Self-reported sciatica or indicators suggestive of sciatica have been used in studies for capturing the prevalence of the condition or for exploring risk factors for the onset or persistence. This is the first study to investigate the diagnostic accuracy of commonly used patient self-report items for sciatica in a primary care setting and in an unselected population presenting with LBP and leg pain. The results suggest that self-report is not an accurate method for identifying individuals with the condition and it may overestimate or underestimate its prevalence. Certain self-report indicators particularly pain radiating below, leg pain worse than back pain and numbness, pins and needles in the leg can be useful at a very crude level. However, when accuracy in case definition is of importance, clinical examination is the recommended method.