Neck Pain and Disability Scale and the Neck Disability Index: reproducibility of the Dutch Language Versions
- 1.5k Downloads
The first aim of this study was to translate the Neck Pain and Disability Scale (NPAD) from English into Dutch producing the NPAD–Dutch Language Version (DLV). The second aim was to analyze test–retest reliability and agreement of the NPAD–DLV and the Neck Disability Index (NDI)–DLV. The NPAD was translated according to established guidelines. Thirty-four patients (mean age 37.5 years, 68% female) with chronic neck pain (CNP), within an outpatient rehabilitation setting, participated in this study. The NPAD–DLV and the NDI–DLV were filled out twice with a mean test–retest interval of 18 days. The intraclass correlation coefficient of the NPAD–DLV was 0.76 (95% confidence interval (CI) 0.57–0.87) and of the NDI–DLV 0.84 (95% CI 0.69–0.92). The limits of agreement of the NPAD–DLV and the NDI–DLV were, respectively, ±20.9 (scale 0–100) and ±6.5 (scale 0–50). The reliability of the NPAD–DLV and the NDI–DLV was acceptable for patients with CNP. The variation (‘instability’) in the NPAD–DLV total scores was relatively large and larger than the variation of the NDI–DLV.
KeywordsNeck pain Neck Pain and Disability Scale Neck Disability Index Reliability Agreement
Neck pain is a common musculoskeletal complaint in western societies . In the large majority of cases, the pathological basis for the neck pain is unclear and the complaints are labeled as ‘non-specific’ or ‘mechanical’ . Disability, limitations in activities and restrictions in participation in daily living and work, may be the result [30, 32]. The majority of the total costs of neck pain in the Netherlands were costs due to sick leave and disability payment .
Self-reported disability in patients with neck pain is often measured by means of region-specific questionnaires . These questionnaires may measure disability or the functional status with greater responsiveness than generic health questionnaires . Questionnaires should have good psychometric qualities, among which is reproducibility . Reproducibility is the extent to which the same results are obtained on repeated tests when no real change in health status has occurred [14, 25]. Reproducibility may be influenced by random measurement errors and within patient variance [14, 27]. Both sources of variance may lead to score instability (natural variation) on repeated tests [14, 27]. If a patient with neck pain fills out the same questionnaire on two occasions, it is relevant to know what score instability can be expected in a predefined retest interval or in a waiting period. Reproducibility does have two aspects: reliability and agreement, representing test–retest score stability over time on group level and on individual level, respectively . A measure for reliability is the intra-class correlation coefficient (ICC) . It assesses not only the strength of the correlation between two repeated measures, but also if all measures on each subject are identical and do not differ systematically . To quantify the agreement, the test–retest score stability over time on individual level, the limits of agreement (LOA) can be calculated according to the method of Bland and Altman [3, 6]. The LOA lies two standard deviations (SDdifference) above and under the mean total score difference of all patients between the first and second test. This means that, due to score instability, approximately 95% of all differences within patients will lie between these LOA on repeated tests [3, 6]. In an individual patient, the change due to treatment should exceed these LOA before one can state that real change has occurred.
The most used Neck Disability Questionnaires are the Neck Pain and Disability Scale (NPAD)  and the neck disability index (NDI) , which are both translated into several languages [1, 2, 5, 9, 13, 18, 19, 20, 21, 23, 24, 28, 29, 33]. To be able to use the NPAD and NDI in different countries and social environments these questionnaires must not only be translated properly, but also culturally adapted and validated [4, 25]. These translations allow comparison of results of clinical research trials between countries. To investigate which questionnaire is most appropriate, psychometric studies are needed where questionnaires are applied simultaneously to the same sample of patients . The advantage of the NPAD over the NDI may be the simple wording and the unitary structure of the questions; moreover, the questionnaire is easy to complete with its visual analog scale structure [9, 23, 25, 28]. The NPAD has not been formally translated into Dutch and, consequently, psychometric qualities of the Dutch Language Version (NPAD–DLV) are unknown. The reliability of the NDI–DLV has been studied in patients with acute neck pain, but not in patients with chronic neck pain (CNP) other than patients with Whiplash associated disorder (WAD) [17, 31]. The first aim of this study was to translate the NPAD from English into Dutch. The second aim was to analyze test–retest reliability and agreement of the NPAD–DLV and the NDI–DLV in patients with CNP in an outpatient rehabilitation setting.
Translation and cross-cultural adaptation of the NPAD
The NDAP was translated using a forward and backward translation procedure . Two native Dutch speakers (a clinician, aware of the concepts behind the questionnaire and a staff member of the pain rehabilitation team) independently translated and culturally adapted the original version. The translated versions were critically reviewed reciprocally, compared with one another and with the original English version. Disagreements were discussed and a consensus version was produced. A backward translation (from Dutch into English) of this consensus version was made by a bilingual physiotherapist involved in spine research. The translators examined translation, backward translation and notes about the discussions made during the translation process. A concept version of the NPAD–DLV was developed and pilot-tested on a heterogeneous group of ten patients and employees of the rehabilitation center, who were asked to comment critically on understandability of the questions and instructions, responses, wording and layout. Finally, a final NPAD–DLV was produced.
CNP patients were recruited from referrals by general practitioners or medical specialists for rehabilitation treatment in the Center for Rehabilitation at the University Medical Center Groningen, The Netherlands. Inclusion criteria for this study were: non-specific chronic neck pain (>3 months duration), admitted for rehabilitation, age between 18 and 65 years, less than 2 years out of work due to CNP or still at work with frequent sick leave due to neck pain, and sufficient knowledge of the Dutch language to complete questionnaires. Exclusion criteria were: specific neck pain, status post surgery in the cervical region, cardiovascular or pulmonary diseases significantly diminishing physical capacity, pregnancy, addiction to drugs, extensive psychological or behavioral problems.
Prior to the first visit at the University Medical Center patients filled out a baseline questionnaire assessing demographics and clinical characteristics. During the first visit a review of the medical history and a physical examination was performed. Immediately afterwards patients filled out the NPAD–DLV and the NDI–DLV. A second visit was scheduled, depending on subject availability, 1–5 weeks after the first visit, but prior to the start of the outpatient rehabilitation program. During the second visit the patients filled out the NPAD–DLV and the NDI–DLV for the second time. All patients signed informed consent prior to entering the study.
The NPAD is a questionnaire whose development used the Million Visual Analogue Scale (VAS) as a template . The NPAD consists of 20 items. Each item has a VAS of 100 mm with numeric anchors at 0, 1, 2, 3, 4 and 5 (each 20 mm apart). Item scores range from 0 (no pain or limitation in activities) to 5 (as much pain as possible or maximal limitation). The total NPAD score can vary from 0 to 100 points . Test–retest reliability, expressed as intraclass correlation coefficient (ICC), of different language versions of the NPAD ranged from 0.81 to 0.98 with retest intervals from 1 day to 1–2 weeks. [2, 9, 19, 21, 33].
The NDI is a questionnaire based on the Oswestry low back pain disability questionnaire and consists of 10 items . Each item has six different assertions expressing progressive levels of pain or limitation in activities with a score between 0 (no pain or limitation) and 5 (as much pain as possible or maximal limitation). The total NDI scores can vary from 0 to 50 points . Test–retest reliability, expressed as ICC, of different language versions of the NDI ranged from 0.50 to 0.97 with retest intervals from 1 day to 3 weeks [11, 12, 19, 20, 22, 24, 29, 31, 33]. The ICC of the NDI–DLV was 0.90 in patients with acute neck pain in general practice over a retest interval of 7 days . For patients with WAD the test–retest reliability of the NDI–DLV was r = 0.81 over a retest interval of 3 months .
Descriptive statistics were calculated for the total scores of the two test sessions for both questionnaires. Reliability of the NDAP–DLV and the NDI–DLV was expressed as ICCs for the total scores. ICCs of 0.75 or higher were interpreted as acceptable reliability . To quantify agreement (the test–retest score stability on individual level) of the NPAD–DLV and the NDI–DLV the limits of agreement were calculated as described by Bland and Altman [3, 6]. Statistical analyses were performed with SPSS 14.0.
Demographic and clinical characteristics of the study sample (n = 34)
Mean (SD) or Median (IQR)
Duration of chronic pain (months)
Sick leave in the past year (weeks)
Pain radiating to
Between shoulder blades
Pins and needles below elbow
Low back pain
Self-reported cause of neck pain
Motor vehicle accident
Previous treatment for neck pain
Intermediate vocational education
Work status (self employed/employee)
Involved in litigation
Reliability and agreement
Total scores of the NPAD–DLV (n = 33) and NDI–DLV (n = 32) at test and retest, intraclass correlation coefficients (ICC), difference between test and retest and limits of agreement
ICC (95% CI)
Limits of agreement
The cross-cultural adaptation of the NPAD followed the forward and backward translation procedure. This procedure warranted on the one hand the meaning of the original items and on the other hand capturing of contents and meanings of the questions in the translation into the Dutch language. Other than the production of a more detailed general instruction, only minor modifications were made in items 13 and 18. The adaptation of item 18 is supported by the fact that the developers of the NPAD questionnaire have explicitly related this item to ‘neck problems’  and other authors to ‘neck dysfunction related to activities of the cervical spine’ [2, 9, 13, 23]. The new NPAD–DLV was easy to comprehend. To complete both the NPAD–DLV and the NDI–DLV required maximally 15 min. The number of missing responses was negligible, which was in agreement with other non-English versions of the questionnaires [1, 2, 5, 9, 13, 20, 21, 23, 24, 28, 29, 33].
The NPAD–DLV and the NDI–DLV demonstrated acceptable reliability in a sample of patients with CNP. The retest interval depended on the availability of the patient. In our rehabilitation setting it is normal to have a waiting period (1–2 months) between intake and start of the program. Therefore, it is interesting to know the extent of changes in questionnaire outcome occurring in absence of treatment. The sample sizes in our study were similar to other reproducibility studies [2, 9, 12, 19, 20, 21, 22, 24, 29, 31, 33] except for four other studies [9, 12, 19, 33] where the sample sizes were 23, 17, 102 and 101, respectively. The female to male ratio in the current study is similar to that in most former reproducibility studies. [9, 12, 19, 20, 22, 31, 33].
In our study, the mean total score of the NPAD–DLV was 50.7 and of the NDI–DLV 22.6. In other studies, the mean total scores of the NPAD ranged from 38.2 to 60.5 and of the NDI from 11.0 to 23.0 [2, 9, 11, 12, 19, 20, 22, 29, 31, 33]. In general, studies carried out in tertiary referral centers have higher total scores than those in primary care settings. In all studies, where both questionnaires were used, including ours, the NPAD scores were approximately 10% higher than the NDI scores when presented in % of a 0–100 scale [19, 21, 24, 33].
The reliability of the NPAD–DLV in our study (ICC = 0.76) was lower than in reliability studies with shorter retest intervals (less than 2 weeks ICC = 0.81–0.98 [2, 9, 19, 21, 33]). The reliability of the NDI–DLV in our study (ICC = 0.84) was somewhat lower than in most former NDI studies with generally shorter retest intervals (less than 2 weeks ICC = 0.50–0.97 [11, 19, 22, 24, 29, 31, 33]; 2 weeks, ICC = 0.88 ; 3 weeks ICC = 0.68 ). Perceived recovery (change) in the retest interval, to include the ‘stable’ patients in the reliability studies, was assessed in only the half of above mentioned NPAD and NDI studies [2, 29, 31, 33]. Apart from that it seems not to have resulted in differences in the extent of ICCs [2, 9, 20, 21, 22, 24, 29, 31, 33]. A trend may be seen that studies with a shorter retest interval do have higher ICCs. When looking at another region-specific questionnaire, the same trend was reported . To test for the bias caused by differences in retest interval duration we assessed a partialled retest correlation; this means that we assessed the test–retest correlation for the NPAD–DLV and NDI–DLV while ‘controlling’ the effect of retest interval duration. The Pearson correlations for the test–retest reliability while ‘controlling’ or ‘not controlling’ for retest interval duration were r = 0.70 and r = 0.72 for the NPAD and r = 0.87 and r = 0.87 for the NDI. These results indicate that the influence of the effect of retest interval duration is minimal or negligible.
The NPAD is claimed to be a questionnaire with four underlying dimensions . Factor analyses in other language NPADs identified two to four factors on which different items were loading [9, 13, 23, 24, 33]. The factorial structure presented in the original publication was based on a relatively small sample (n = 95); therefore, the stability of the observed factor solution may be questioned and too sample specific to be reproducible in different samples. Therefore, comparison of the ICCs for subscales in different (language) NPAD studies is challenging. However, a principal component analysis in a German study with a sample size of 448 indicated a one-factor solution for the NPAD, and it was concluded that the NPAD is a multidimensional assessment instrument measuring different dimensions of one construct neck pain, in a stable manner . Because above mentioned reasons and because factor analysis was not an aim of the current study only total scores were used to analyze the reliability of the NPAD–DLV.
If a patient with neck pain fills out the same questionnaire on two occasions, in a waiting period prior to the start of a rehabilitation program, a (very) short time interval increases the probability of carryover or recall effects due to memory, mood or practice, whereas a larger interval increases the probability the clinical status has changed and that the score of the first session has been forgotten . There are several explanations for possible changes of the clinical status during the waiting period: the effect of the clinic consultation, the anticipation of the patient on the program, the effect of a period of waiting before the real rehabilitation program starts, the chronic neck pain itself with its fluctuations and the questionnaires itself .
To quantify the agreement, the test–retest score stability over time on individual level, the ‘limits of agreement’ (LOA) were calculated. No criteria are available for interpretation of the LOA. Smaller LOA means more stability and indicate that the natural variation is smaller. The SDdifference (the standard deviation of the mean total score difference off all patients between the first and second test) and the LOA of the NPAD–DLV in the present study (10.4 and ±20.9) were somewhat higher than in one other study (9.0 and ±17.9) with a retest interval of 1 day . In this French study, a 5 point ordinal transition scale was used to include clinically stable patients. Despite a clear difference in retest intervals, the differences in SDdifference were small. The SDdifference and the LOA of the NDI–DLV in the present study (3.2 and ±6.5) was similar to most other NDI studies with shorter retest intervals (1 day, SDdifference 3.4, LOA ±6.7 ; 1 week, SDdifference 3.9, LOA ±7.8 ; 1 week SDdifference 1.5, LOA ±3.0 ; 1–2 weeks, SDdifference 4.4, LOA ±8.9 ). In three of these four studies, a transition scale was used to include clinically stable patients [22, 29, 31, 33]. Proportionally, the SDdifference of the Greek study  was similar to the SDdifference of the present study (SDdifference/mean total score was 0.12 and 0.14, respectively). The NDI reliability studies have shown smaller SDdifference and smaller natural variations compared to the NPAD reliability studies [22, 29, 31, 33]. Larger instability of the NPAD may be explained by differences in operationalizations of ‘neck disability’ between items of the NPAD and the NDI [30, 32]. Post hoc analysis showed that the amount of natural variation of the NPAD–DLV could not be attributed to individual items of the questionnaire. Clinical effects of therapy in an individual patient should exceed the limits of agreement before one can state that real change has occurred. The minimal clinically important difference (MCID) is suggested to be 11 points on the NPAD  and 2 to 10 points on the NDI [11, 12, 26, 31]. Based on the variation in the current study, patients have to change at least 21 points on the NPAD–DLV (scale 0–100) and at least 7 points on the NDI–DLV (scale 0–50), will these patients be judged as having ‘really’ changed.
There are limitations to consider in evaluating our research. First, the sample size is relatively small (n < 50), therefore, our sample could have misestimated the “true” population ICCs and LOAs. However, the ICCs of the NPAD–DLV and NDI–DLV were in line with two studies with larger samples and with ICCs of, respectively, 0.81 and 0.86 (n = 102)  and 0.91 and 0.93 (n = 101) . Second, the retest interval was longer than in most other NPAD and NDI studies [2, 9, 11, 19, 20, 21, 22, 24, 29, 31, 33]. Therefore, the reported ICC and LOA values in the present study probably underestimate the reliability and agreement of the scales. Nevertheless the retest interval duration may be not the only important factor influencing these values, in view of an NDI study with a retest interval of 2.5 days (SD ± 0.95) where the ICC was 0.50 . Other factors such as symptom duration (acute, sub-acute or chronic), patient setting (primary, secondary or tertiary care) and mean disability score on the questionnaires may also influence the values of ICC and LOA. Third, the retest interval was not fixed and perceived recovery (change) in this interval was not controlled for. However, SDdifference of the NDI–DLV in our sample was similar to studies where change was controlled for.
A strength of this study is that to the authors’ knowledge for the second time a reproducibility study is made for the NPAD with respect to reliability and limits of agreement with a head to head comparison with the NDI. Further study with the NPAD–DLV is necessary to assess the reliability and agreement in other patient groups (e.g. acute, sub-acute, primary care patients), to assess the ICC with a shorter retest interval and to study other measurement properties, such as validity, responsiveness and MCID.
A reliable DLV of the NPAD was developed. The reliability of the NPAD–DLV and the NDI–DLV was acceptable for patients with CNP within an outpatient rehabilitation setting. The natural variation (‘instability’) in the NPAD–DLV total scores was relatively large and larger than the variation of the NDI–DLV.
The authors thank Marleen Speller for assistance in data collection.
This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
- 12.Cleland JA, Fritz JM, Whitman JM, Palmer JA (2006) The reliability and construct validity of the Neck Disability Index and patient specific functional scale in patients with cervical radiculopathy. Spine (Phila Pa 1976) 31(5):598–602Google Scholar
- 17.Heymans WFGJ, Lutke Schipholt HJA, Elvers JWH et al (2002) Neck Disability Index Dutch Version (NDI–DV): investigation of reliability in patients with chronic whiplash. Ned T Fysiother 112:94–99Google Scholar
- 18.Köke AJA, Heuts PHTG, Vlaeyen JWS et al (1996) Neck Disability Index. Pijn Kennis Centrum Maastricht. Meetinstrumenten chronische pijn, MaastrichtGoogle Scholar
- 26.Pool JJ, Ostelo RW, Hoving JL, Bouter LM, de Vet HC (2007) Minimal clinically important change of the Neck Disability Index and the Numerical Rating Scale for patients with neck pain. Spine (Phila Pa 1976) 32(26):3047–3051Google Scholar
- 27.Portney LG, Watkins MP (2000) Foundations of clinical research, 2nd edn. Prentice-Hall, Upper Saddle RiverGoogle Scholar