Background

Patient-rated outcome measures (PROMs) are an important part of clinical decision making in rehabilitation of patients with shoulder pain. A number of PROMs are available[1, 2], of which the Disabilities of the Arm, Shoulder and Hand (DASH) outcome measure is one of the most commonly used. The DASH is a condition-specific PROM developed to assess physical disability and symptoms in people with musculoskeletal disorders of the upper extremity[35]. The original English version of the DASH has been translated and adapted into many languages[5, 6]. It has shown to be reliable, valid and responsive in patients with shoulder pathologies[1, 4, 7].

Shoulder impingement syndrome is the most common clinical diagnosis for patients with shoulder pain[8, 9] and can be defined as a symptomatic compression of the subacromial structures during elevation of the arm[10, 11]. The main symptom is anterior-lateral shoulder pain when lifting the arm above shoulder level and during overhead activities. Contributing factors to the development of shoulder impingement syndrome include inflammation of the tendons and bursa, degeneration of the tendons, postural dysfunctions and weak or dysfunctional rotator cuff and scapular musculature[10, 11]. Various terms are used for the same condition such as rotator cuff disease/tendinopathy, subacromial impingement syndrome and painful arc syndrome.

Measurement properties of an outcome measure are related to the population and context in which it is used. Before a translated outcome measure can be used with confidence in clinical or research settings, the measurement properties of the translated version need to be determined[12]. The measurement properties of the Norwegian language version of the DASH have not previously been investigated in non-rheumatic patients with shoulder pain. The objective of this study was to examine the reliability and construct validity of the DASH in patients with shoulder impingement syndrome.

Methods

Patients

The patients were recruited from an outpatient clinic from December 2007 to October 2010. We included adult patients with the primary diagnosis of shoulder impingement syndrome (M75.4 in the ICD-10). The patients were diagnosed by an orthopaedic surgeon and screened for inclusion in the study by a physiotherapist. The impingement diagnosis was based on reported symptoms and clinical findings such as anterior-lateral shoulder pain worsening during elevation of the arm and overhead activities, normal or close to normal passive range of motion of the shoulder and positive impingement sign[13].

Patients were excluded from the study if they had generalized pain, symptoms of cervical spine disease, had undergone surgery in the affected shoulder within the last six months or were unable to understand written and spoken Norwegian. In addition, patients were excluded if they had been diagnosed with any rheumatologic illness, chronic systemic disease or cardiac disease.

Measures

The Disabilities of the Arm, Shoulder and Hand questionnaire (DASH)

The DASH was developed by the Institute for Work and Health and the American Academy of Orthopaedic Surgeons (AAOS)[35]. It was designed to be a discriminative and evaluative measure of physical disability and symptoms in patients with musculoskeletal disorders of the upper extremity, assessing the condition of the patient during the past week. It measures whether the respondent has the capacity to do an activity, regardless of how it is performed. The main part of the questionnaire, the DASH disability/symptoms score, contains 30 items: 21 items about the ability to perform certain physical activities, 5 items about the severity of pain, activity-related pain, tingling, weakness and stiffness and 4 items concerning the effect of the upper extremity problem on social activities, work, sleep and self-image. Each item is scored on a five-point ordinal scale. To calculate the DASH score all completed responses are summed and averaged. This value is subtracted by one and multiplied by 25, giving a total score ranging from best to worst on a 0–100 scale. At least 27 of the 30 items must be completed to calculate a score. The translation and adaptation process of the Norwegian language version of the DASH is described by Finsen (in Norwegian)[14].

The Shoulder Pain and Disability Index (SPADI)

The SPADI was designed to measure pain and disability in patients with shoulder disorders[15]. A Norwegian version is available[16]. It is a 13-item PROM divided into two subscales: the five-item pain subscale and the eight-item disability subscale. Each item in the original version is scored on a visual analogue scale from from 0 (no pain/no difficulty) to 11 (worst pain imaginable/ so difficult required help). The pain and disability scores are equally weighted and added for the total SPADI score, ranging from best to worst on a 0–100 scale. The Norwegian version of the SPADI has shown to have acceptable reliability and validity in patients with rotator cuff disease[17].

The Short Form 36 Health Survey (SF-36)

The 36 item Short Form Health Survey is a generic PROM developed to assess eight health domains[18]: physical functioning (PF), role-physical (RP), bodily pain (BP), general health (GH), vitality (VT), social functioning (SF), role-emotional (RE) and mental health (MH). Each of the eight domains is scored worst to best on a 0–100 scale. Two summary scores representing physical (Physical component summary; PCS) and mental health (Mental component summary; MCS) can be calculated. We used the Norwegian version 1.2[19]. The measurement properties of the SF-36 have been tested extensively[18].

Numeric Pain Rating Scale (NPRS)

The patients were asked to rate their pain on average over the last week prior to assessment using an 11-point Numeric Pain Rating Scale from 0 (no pain) - 10 (worst possible pain)[20]. The NPRS is a commonly used outcome measure for patients with shoulder pain and has shown to be reliable and valid[21].

Procedures

Each patient was scheduled for two visits approximately one week apart. At the first visit the patients filled out the DASH, SPADI, SF-36 and Numeric Pain Rating Scale. Descriptive data such as age, sex, symptom duration and employment status were also collected. At the second visit the patients filled out the DASH and SPADI and they were asked if the shoulder condition had changed since the first visit. The patients did not receive any treatment between the first and the second visit. The study protocol was approved by the Regional committees for medical and health research ethics (REC South East), and was carried out in accordance with the Helsinki Declaration. We obtained written informed consent from all the participants at inclusion.

Statistical analysis

QualityMetric Health Outcomes Scoring Software 3.0 (QualityMetric Inc, Lincoln, Rhode Island, USA) was used to manage missing SF-36 data and to calculate the SF-36 scores. PAWS Statistics 18.0 for Windows (SPSS Inc, Chicago, Illinois, USA) and the Stata/IC 11.2 for Mac (StataCorp, College Station, Texas, USA) were used for the other analysis. Floor and ceiling effects were evaluated using histograms, and were considered to be present if more than 15% of the respondents achieved the lowest or highest possible score, respectively[22]. P-values less than 0.05 were considered statistically significant.

Reliability

Internal consistency was assessed using Cronbach’s alpha and item-to-total correlation coefficient. A Cronbach’s alpha between 0.70 and 0.95 was considered to indicate a good internal consistency[22]. Item-to-total correlations above 0.3 were considered good[23]. Test-retest reliability was calculated with the use of intraclass correlation coefficient (ICC); 2.1-two-way random effect model single measures. The ICC classifications of Fleiss were used to interpret the ICC values[24]: ICCs above 0.75 may indicate excellent reliability, values between 0.40 and 0.75 fair to good reliability and values below 0.40 poor reliability. All patients who reported their shoulder pain as unchanged between the first and the second visit were included in the test-retest reliability analysis. Measurement error was assessed by estimating the standard error of measurement (SEM), minimally detectable change (MDC) and limits of agreement (LoA). SEM was calculated as the square root of the within-subject total variance of an ANOVA analysis[25]. The MDC was calculated as 2 × SEM [26]. The 95 percent confidence interval for the SEM (SEM95) and MDC (MDC95) was calculated by multiplying by the z-score of 1.96. LoA was calculated according to the Bland-Altman method[27] and a LoA plot was made for visual judgement. A paired t-test was performed to determine the systematic difference between the DASH scores at test and retest.

Construct validity

Construct validity was evaluated by testing six a priori hypotheses for the Pearson’s correlation coefficient between the DASH and the SPADI, SF-36 and the NPRS. A priori hypotheses were made based on the conceptual model of the measures and results of previous studies. As suggested by Rowntree[28] correlation coefficients below 0.2 were considered as very weak or negligible, between 0.2 and 0.4 as weak or low, between 0.4 and 0.7 as moderate, between 0.7 and 0.9 as strong, high or marked, and above 0.9 as very strong or very high. We hypothesized a high and positive correlation (0.7 to 0.9) between the DASH total score and the SPADI total score. We expected a negative, moderate correlation (-0.4 to -0.7) with the PF, BP, and PCS score of the SF-36. Furthermore, we hypothesized a negative, moderate correlation with the SF of the SF-36. The DASH score was expected to correlate higher with the PCS of the SF-36 than with the MCS.

Results

Ninety-four patients met the inclusion criteria, of whom 29 were unwilling or unable to participate in the study and two were excluded because of generalized pain. Sixty-three patients (30 women, 33 men), with a mean age of 53 (Standard deviation [SD] 12.9) years, were included in the study. The mean duration of symptoms was 46.6 (SD 72.3) months, ranging from 2 to 420 months. Descriptive characteristics of the patients are shown in Table 1. Mean interval between the first and the second visit was 7.5 (SD 2.1) days, ranging from 7 to 21 days. Four missing values were found in both the DASH and SPADI at retest (completion rate, 93.7%) and these subjects were excluded in the analyses for test-retest reliability and calculation of measurement error. Two missing values (RP and MH) were found in the SF-36 (completion rate, 99.9%). The SF-36 records were scored with the QualityMetric’s missing data estimation (MDE). The DASH scores were considered to be normally distributed. No floor or ceiling effects were found. The scores ranged from 8.3 to 58.6 at the first visit and from 5.0 to 58.6 at the second visit.

Table 1 Descriptive characteristics of the patients n = 63

Reliability

Reliability data are presented in Table 2. The internal consistency estimate (n = 63) of Cronbach’s alpha was 0.93. Item-to-total correlations ranged from 0.36 to 0.81. The DASH mean score (n = 59) at the first visit was 29.5 (SD 14.0) and at the second visit 28.4 (SD 14.0), giving a mean difference of 1.1 (SD 6.64) (95% confidence interval [CI] -0.65 to 2.82). ICC (n = 59) was 0.89 (95% CI 0.82 to 0.93). The SEM was 4.7 and SEM95 was 8.3. The MDC was 6.7 and MDC95 was 13.1. The 95% LoA was calculated to be between -11.9 and 14.1, and 4 out of 59 (6.8%) were outside the LoA (Figure 1).

Table 2 Cronbach’s alpha, item-to-total correlations, intraclass correlation coefficient (ICC), standard error of measurement (SEM) and minimally detectable change (MDC 95 ) of the Norwegian version of the DASH
Figure 1
figure 1

Limits of agreement plot n = 59. The difference in scores between DASH test and retest DASH plotted against the average scores of both test occasions. The middle dotted line represents mean difference (1.1). The bottom and top dotted line represent the 95% limits of agreement (-11.9, 14.1). Four out of 59 (6.8%) were outside the limits of agreement.

Construct validity

The statistical analysis of correlation coefficients between DASH and the SPADI, SF-36 and NPRS are shown in Table 3. We found a high positive correlation between the DASH and SPADI (0.75). The DASH showed a moderate negative correlation with the PF (-0.48), BP (-0.62), and PCS (-0.59) of the SF-36, and a moderate positive correlation with the NPRS (0.58). The DASH correlated higher with the PCS (-0.59) than with the MCS (-0.17) score. It had a low negative correlation to the SF (-0.35) of the SF-36. All correlations were in line with a priori hypotheses except the low negative correlation to the SF of the SF-36, which mean that eighty-three percent of the hypotheses of correlation were confirmed.

Table 3 Descriptive statistics of the DASH, SPADI, SF-36 and NPRS at test and Pearson’s correlation coefficients for DASH n = 63

Discussion

Our results provide evidence for good reliability and validity of the Norwegian language version of the DASH in patients with shoulder impingement syndrome. The results are comparable to those reported for the original English version and other language versions.

Reliability

The Cronbach’s alpha coefficient of 0.93 indicated a good internal consistency and is similar to previous reported values. In the original version[5] and other languages versions[2931] the reported Cronbach’s alpha was 0.96. A value of 0.93 was also reported for the Norwegian language version in patients with rheumatic diseases[32]. A Cronbach’s alpha between 0.70 and 0.95 have been proposed as a measure of good internal consistency[22].

Internal consistency assessed by item-to-total correlations ranged from 0.36 to 0.81.The item-to-total correlations were above the threshold value of 0.3, suggesting that the correlation between each item and the total score of the questionnaire were acceptable. Item-to-total correlations reported for the original English version of the DASH ranged from 0.49 to 0.87[33]. Values reported for other language versions of the DASH ranged from 0.27-0.88[30, 34, 35].

The test-retest reliability of the DASH was calculated to 0.89, which is considered to be excellent[24]. Studies for other languages versions have also shown high test-retest reliability with ICC values varying from 0.82 to 0.96,[30, 3642]. We retested the patients after approximately one week, which is within the recommended time frame ranging from two days to two weeks[43]. Due to this short time interval, most of the patients reported their shoulder pain as unchanged at the second visit.

In order to detect any systematic changes, the mean difference between the DASH test and retest was visualized in a limits of agreement plot. The limits of agreement plot may reveal systematic changes between the difference and the average of the DASH or outlying observations. Four out of 59 (6.8%) observations exceeded the limits of agreement. The mean difference between DASH test and retest was 1.1 (95% CI -0.65 to 2.82) and showed no systematic change. There was no apparent tendency for the mean difference to vary systematically with the average score.

The SEM was 4.7 points, SEM95 was 8.3 and the MDC95 was 13.1. These results correspond well with the measurement error values reported for the original English version with a SEM of 4.6 points and a MDC95 of 12.8 points[40]. The interpretation of SEM95 is that if a patient has a measured DASH score of for example 50 points at an initial test, the clinician can be 95 percent confident that the patient’s true score lies somewhere between 42 and 58 DASH points. The MDC95 of 13.1 indicates that the clinician can be 95 percent confident that a change has occured if the measured DASH score at retest has changed more than 13.1 points.

A distinction between MDC and minimally important change (MIC) is useful when interpreting change scores in PROMs[26]. The MDC is a measure of the statistically important change. The MIC can be defined as the smallest change in score which is perceived as important by patients, clinicians, or relevant others[26, 44]. Different methods may be used to estimate this threshold value which indicates if a patient is better or worse[45]. A change above 15 points is found to be above most estimates of MIC for the DASH, and is considered to be the most accurate change score for discriminating between improved and unimproved patients[5].

Construct validity

Our results of construct validity agree with previous studies[5]. The expected high positive correlation between DASH and SPADI was confirmed with a correlation coefficient of 0.75. Both the DASH and SPADI intend to measure activity limitations and pain (symptoms). However, there are differences in the content of these questionnaires. The DASH is found to be more wide-ranging than the SPADI and can be linked to 23 categories of the International Classification of Functioning, Disability and Health model (ICF), whereas SPADI is linked to six categories[46].

We had hypothesized a moderate and negative correlation with the Social Functioning domain of the SF-36, because the DASH is also meant to measure components of the social dimension: family care, occupational and socializing with friends and relatives. A moderate correlation with the Social Functioning of SF-36 has been reported in several other languages versions with correlation coefficients ranging from -0.53 to -0.64[31, 36, 4749]. The expected moderate negative correlation with the SF subscale of SF-36 was not confirmed. The low negative correlation to the SF (-0.35) may indicate that the Norwegian language version of DASH to a limited degree identifies the social dimension of functional status in this population, as measured by the SF-36.

DASH scores

The DASH questionnaire measures whether the respondent has the capacity to do an activity, regardless of how it is performed. It is scored from 0 (best) to 100 (worst). A mean DASH score of 10 have been reported for the general population of the United States[33]. A mean score of 13 have been reported for both the general population in Norway[50] and a working population in Germany[51]. A Norwegian study of physical function in adult acquired major upper-limb amputees reported a mean DASH score of 22.7[52]. The mean score of 29.4 (SD ± 13.8) in our study population indicated a more severe level of disability compared with these populations. The level of disability in our study population is comparable to other studies of patients with shoulder impingement syndrome[53, 54].

Study limitations

The DASH was designed to measure physical function and symptoms in patients with musculoskeletal disorders of the upper extremity. The results in this study are limited to patients with the primary diagnosis of shoulder impingement syndrome and can not be generalized to other disorders of the upper extremity. Another limitation of this study is that we did not evaluate responsiveness, which has been defined as the ability of a questionnaire to detect change over time in the construct to be measured[55]. Responsiveness is considered as an important measurement property of a PROM used for treatment evaluation and needs to be evaluated for the Norwegian version in future research.

Conclusions

This study demonstrated excellent test-retest reliability, good internal consistency and established error values for the Norwegian language version of the DASH. Furthermore, this study provided evidence supporting the DASH as a valid measure of physical disability and symptoms in patients with shoulder impingement syndrome.