Background

The assessment process in both the clinical and research setting has progressively incorporated patient-reported outcome (PRO) measures and upper limb assessment is no exception. Regional and condition specific PROs enable the quantification of patient impairment [1]. Doward and McKenna have described this as a '...needs based approach' [2]. This assists the clinical decision-making process [3, 4] and facilitates compliance with the protocols within professional organizations [5], government agencies [6, 7] and insurer groups [8]. There are limited upper limb PROs developed specifically for the region as a single kinetic chain [9] that accommodate the requirements of both the clinician and researcher in an efficient and effective manner [10, 11].

The 30-item Disabilities Arm Shoulder and Hand (DASH) [9] is reported to fulfil these criteria. It was validated for a variety of disorders [1, 9, 1216] and its availability in different languages has increased rapidly [1719]. The shorter 11-item QuickDASH was developed to reduce respondent and administrative burden and eliminate item redundancy. This improved compliance [20], item redundancy and scale width for higher impairment conditions [21]. Consequently, there is an impetus for the QuickDASH to replace the DASH [22] and be advocated as a criterion standard for upper limb measurement [23, 24]. However, the validity of the QuickDASH has been questioned as a consequence of conflicting findings on the factor structure [25, 26]. A single factor structure is an essential property of all PROs that provide a single summated score [27]. A PRO must exhibit a single predominant theme or factor, such as upper limb function, that is common to all item-questions. The factor structure must be unidimensional when analyzed. The most appropriate method is Maximum Likelihood extraction (MLE) [28].

A literature search (PubMed, Medline, CINAHL, Embase, Cochrane and Google Scholar) found five prospective studies that investigated the QuickDASH. They considered the psychometric and practical characteristics in general populations [24, 25, 29], burns patients [30] and as a work injury prediction tool [31]. The original validation [20] and several subsequent studies reanalyzed data with the eleven items extracted from existing 30-item DASH responses [21, 26, 32]. Only two studies investigated the QuickDASH factor structure. There was a unidimensional structure in the prospective study on the Japanese-language version [25] but a bidimensional structure in reanalyzed extracted data of the French-language version [26]. Both authors used principal component analysis which is considered inappropriate for PROs [28]. Factor structure in the English-language version has not been reported. Consequently, the factor structure must be clarified and determined prospectively with appropriate item-extraction methodology in a general upper limb population.

The primary aim of this study was to determine the factor structure of the QuickDASH and QuickDASH-9. If unidimensional and valid, the next step was to calibrate and validate the psychometric properties and practical characteristics in independent general upper limb populations. Finally these characteristics were compared and correlated with the original full-length DASH and a validated criterion standard, the Upper Limb Functional Index (ULFI) [11, 33].

Methods

Development of the QuickDASH-9

The concept-retention methodology used to reduce the full-length DASH to the QuickDASH [20, 34] was employed to produce the QuickDASH-9 (Figure 1). The authors used consensus agreement following feedback from a practicality focus-group composed of 20 patients and five therapists. This ensured face and content validity would be consistent with the QuickDASH. Items #10 (Pins and needles) and #11 (Sleep) were removed as neither are an activity of daily living. It was hypothesised these changes would enable the QuickDASH-9 to exhibit a unidimensional factor structure. The scoring system was also modified from the existing 1-5 scale to a 0-4 scale and the calculation for scoring adjusted accordingly.

Figure 1
figure 1

QuickDASH-9.

Design

A two stage observational study was used. Stage 1, calibration, extracted the items from the DASH responses in a previous study [11] to form the QuickDASH-9 and QuickDASH. Stage 2, prospective validation, concurrently measured the QuickDASH and ULFI. The QuickDASH-9 scores were determined from extracted QuickDASH responses (Figure 2).

Figure 2
figure 2

Flow chart of calibration from stage 1 and validation stage 2. All QuickDASH-9 data was extracted from the QuickDASH; n = total number of participants; nR = total number of responses; practicality n = 25 composed 20 patients and 5 therapists.

Assessment Questionnaires

The DASH is a four-page 30-item PRO on a 5-point Likert scale (1-5). Subsequent raw scores range from 30 to 150 and are converted to a percentage, 0 (no disability) to 100 (most severe disability) [9]. It has two optional sport or music and work scales, not used in this study. Up to three missing responses are permitted [35]. The QuickDASH is a single-page PRO with eleven items extracted from the DASH [20]. It uses the DASH scale and scoring method and allows for one missing response [9].

The QuickDASH-9 is a single-page PRO with nine items extracted from the QuickDASH and DASH. It uses the DASH scoring method on a 0-4 Likert scale and allows for one missing response (Figure 1).

The ULFI is a single-page 25-item PRO on a 3-point Likert scale. Subsequent raw scores range from 0-25 and are multiplied by four to provide a percentage scale, 0 (normal) to 100 (maximum impairment). Up to two missing responses are permitted [11, 33]. It has an 11-point global 'Numeric Rating Scale' (NRS) to assess overall status with anchors of 0 ('normal or pre-injury') to 10 ('worst possible'). Two optional components provide a qualitative 'Patient Specific Index' and a self-assessed ranking of duties that can be used to calculate a 'Global Assessment of Body And Limbs' score [36].

Setting and Participants

Participants with upper limb musculoskeletal conditions under referral from a medical practitioner were recruited consecutively or successively from primary care physical therapy outpatient clinics. These conditions included soft tissue injury, post surgery, lymphoedema, fractures, chronic regional pain and trauma. Exclusion criteria were <18 years of age, difficulty with English language comprehension and cognitive impairment. Symptom duration ranged from one week to eight years with a mean of 38.7 ± 41.6 weeks. Removal of one outlier at eight years reduced the mean duration to 20.3 ± 25.2 weeks.

Participants receiving ongoing treatment during both the calibration and validation stages were measured at baseline, then at two weekly intervals for six weeks, then four weekly thereafter until discharge. Status was classified as: acute - injured within the previous six weeks; subacute - six to twelve weeks; and chronic - greater than twelve weeks [37].

Study Stages

Stage 1, calibration

Existing data was reanalyzed. This included 211 responses from 137 participants from nine physical therapy outpatient centres in three different Australian states. The methodology is described in a previous publication by the authors [11]. Demographic details are presented in Table 1.

Table 1 Participant demographics

Stage 2, validation

A prospective investigation examined 184 responses from 67 participants, recruited from six physical therapy outpatient centres in one Australian state. Demographic details are presented in Table 1. Repeated measures were made for subgroups of responsiveness (n = 64) and reliability (n = 22). This provided prospective investigation of the concurrently completed QuickDASH and ULFI to determine psychometric and practical characteristics (Figure 2). All QuickDASH-9 responses were extracted from the QuickDASH.

Analysis - Methodological Characteristics

Test-retest reliability

The ICC (2:1) [38] was used at 72 hours from baseline during a period of non-treatment with the NRS as an external reference [3, 9, 11].

Responsiveness

Effect size (ES) and standard response mean (SRM) were used [39]. The NRS provided an external reference standard. Two compared measures were taken. The first at baseline, with the repeated measures made following a period of anticipated change due to natural healing and therapist intervention. These periods were consequently a partial duration of the injury classification being: two weeks for acute participants, four weeks for subacute and six weeks for chronic [9, 11, 40].

Measurement error

The minimal detectable change was taken at the 90% level (MDC90) [41].

Validity

Face and content validity were determined from the development studies [11, 20, 21] and supported in this study by the practicality focus-group (Figure 2). Criterion or concurrent validity was assessed using a Pearson correlation coefficient. Construct validity was demonstrated by a standard t-test that verified change between the baseline and the repeated measures [11, 20].

Internal consistency

Cronbach's alpha coefficient was used [42, 43].

Distribution and normality

This was determined through inspection of the histograms and the one-sample Kolmogorov-Smirnov (KS) test [44].

Factor analysis

The MLE method was used [28] with varimax rotation if two or more factors were determined and coefficient suppression was set at 0.5 [44, 45]. Factor extraction was determined a-priori by: the scree-plot curve point of inflection [46]; an eigenvalue cut-off of 1.0 [47]; and that ≥ 10% of total explained variance was accounted for where average communality after extraction was ≥ 0.6 [45].

Sample size

To ensure sufficient sample power to provide an 80% confidence level in determining actual change above 10.5%, the MDC90 for the DASH, the Dawson and Trapp method was used [48] where required, sample size (n) is:

(U1-U0) = clinically important difference between the means; SD = standard deviation in the population. Za = two tailed and Zb = lower tail as defined from Tables of significance levels.

Analysis - Practical Characteristics

Missing responses

These were noted as a percentage of total responses.

Completion and scoring time

These were calculated from the average of three separate tests in the practicality focus group.

Readability

The Flesch-Kincaid reading scale was used to determine ease of comprehension and readability [49, 50] and calculated from the grammar function from within the word processing program.

Summary performance

Two clinimetric scales were used. The 25-item 'Measurement of Outcome Measures' that considered a measures characteristics under four categories: methodological, practical, distributional and general. The total was summated and multiplied by four to provide scores from 0 to 100% [11]. The 12-item 'Bot scale' considered twelve individual practical and methodological characteristics of a measure and is scored on a 0-12 scale that can be converted to a percentage [16].

Statistical analysis

The Statistical Package for Social Sciences version 14.0 (SPSS Inc, Chicago, IL) was used to analyze the data on an intention-to-treat principle. Statistical sigificance was accepted at the p < 0.05 level. Pooled samples of each questionnaire enabled determination of distribution, missing responses, internal consistency and factor analysis.

Ethics

Ethics approval was given by the University of the Sunshine Coast Human Research Ethics Committee.

Results

Factor Structure

The QuickDASH-9, DASH and the ULFI each had a unidimensional structure determined for their factor matrix in both stages, so no varimax rotation occurred. The QuickDASH had a bidimensional structure, invalidating any single summated score and precluding any further valid analysis of its psychometric properties.

Factor loadings for all items in both the QuickDASH and QuickDASH-9 exceeded the 0.50 suppression level. In the calibration stage the QuickDASH-9 and QuickDASH had primary eigenvalues of 5.4 and 5.7 respectively which accounted for 54% and 62% of variances. In the validation stage these increased respectively to eigenvalues of 6.1 and 6.5 which accounted for 61% and 59% of variance. The QuickDASH-9 factor order was consistent apart from question-item #5 'Use knife' which loaded sixth in the calibration and first in the validation stage (Table 2). The QuickDASH demonstrated identical factor order in both stages (Table 3); however, in addition to the invalid bidimensional structure, one item 'Limited in work' changed factors and another 'Socialize' had cross-loading in the validation stage.

Table 2 QuickDASH-9 factor matrix
Table 3 QuickDASH rotated factor matrix

Psychometric properties

These are presented for each PRO in Table 4 with the construct validity in Table 5. The values for the QuickDASH are invalid but are provided as a comparison to the other PROs and to the findings of previous QuickDASH studies.

Table 4 Methodological characteristics of QuickDASH-9, QuickDASH, DASH and ULFI
Table 5 Construct validity comparison between baseline and repeated measures

Distribution

The impairment range of 0-100% was shown for all PROs with the number of 5% histogram increments for the total score being QuickDASH-9 = 19, QuickDASH = 18 and DASH = 17 whilst the ULFI had values in all 20 increments.

Practical Characteristics

Missing responses

These are detailed in Table 4.

Completion and scoring times

The QuickDASH-9 and QuickDASH were respectively 134 ± 56 seconds and 155 ± 64 seconds and both required a computational aid. The ULFI was 132 ± 51 seconds.

Readability

This was found at grade twelve for the QuickDASH-9 and at grade seven for the ULFI.

Summary performance

The 'Measurement of Outcome Measures' score for the QuickDASH-9 was 88%, the DASH was 72% and the ULFI was 96%. On the 12-point Bot scale the score for the QuickDASH-9 was nine (75%), the DASH was seven (58%) and the ULFI was twelve (100%). The QuickDASH was invalid with respective clinimetric scores of 44% and three (25%).

Discussion

This study proposes the QuickDASH-9, with its valid unidimensional structure, as a way to overcome the existing shortcomings of the QuickDASH. This will enable the concept to continue. The modifications that produce the QuickDASH-9 fulfil the original aims of the QuickDASH [20]: a shortened version of the full-length DASH with comparable or preferable psychometric properties, improved practicality and the elimination of item redundancy [9, 11]. In attempting to achieve these aims the QuickDASH produced a bidimensional factor structure. Its validity as a single summated score cannot be supported.

Our findings propose the DASH scoring scale of 1-5 be modified to 0-4 in the QuickDASH-9. This uses the established format of a 0 based anchor rather than a 1 [51]. This should facilitate practicality and ensure consistency of scoring with other PROs.

The bidimensional structure of the QuickDASH, demonstrated in this study using MLE, is consistent with previous findings by Fayad [26] but conflicts with the unidimensional structure found by Imaeda [25]. However, both previous researchers used principal component analysis which is not recommended [28]. In this study the QuickDASH bidimensional structure demonstrated two factors that can be broadly divided into 'activity' and 'non-activity' items which supports previous findings [21, 26]. The original DASH has a unidimensional structure [33, 5254]. This means the reductive process of concept-retention methodology, that reduces the DASH's 30 items to eleven in the QuickDASH, causes a fundamental change in the factor structure [55]. It is critical that a PRO exhibits a unidimensional structure if it is to accurately reflect the measured region with a single summated score [27].

There is a distinct lack of prospective studies of the QuickDASH and no English versions were found that investigated factor structure. Furthermore, reporting of psychometric properties is incomplete if the factor structure is not stated [24, 2931], and consequently misleading and the results invalid if the structure is not unidimensional.

The use of extracted items from the DASH as the sole method to validate the QuickDASH without prospective testing [20, 21, 26, 32], should only be investigatory. This methodology risks shared measurement error and does not account for part or whole correlation [56] which can lead to type I errors [43]. By completing the prospective aspect of this study on a general upper limb population with a consistent regional reference standard, the ULFI, these error concerns are alleviated for the QuickDASH. However, for the QuickDASH-9 the same criticism applies as it is investigatory research only.

Should the findings of this study be supported by further research, then the QuickDASH-9 would be appropriate to replace the QuickDASH and also the original DASH. Similar proposals are already in place in other body regions. The Neck Disability Index, an advocated PRO, was recently shown to be invalid due to its bidimensional structure [57, 58]. It is proposed that a shortened unidimensional version, the NDI-8 replaces the original [58].

The reliability and responsiveness are lower in the QuickDASH-9 compared to the DASH. This is anticipated and consistent with previous QuickDASH findings [20, 21, 26, 32] as the reduction in items from 30 to nine is substantial.

The QuickDASH-9 mean percentage scores were found to be higher than those of the DASH. This supports previous findings that a shortened tool with improved internal consistency will show greater scale width, particularly for higher impairment conditions [21]. The choice of eleven items for the QuickDASH is based on the a-priori assumption drawn from the 'Spearman-Brown prophesy'. Specifically, that a minimum of eleven items is required to produce an internal consistency within the clinically accepted range of 0.90 to 0.95 [20]. This study has shown that in a shortened 9-item version, the internal consistency can remain within this range and provide a valid instrument with significant gains in practicality. However, a computational scoring aid is still required.

In both stages of this study the QuickDASH-9 showed inferior psychometric properties to the DASH and ULFI, particularly for reliability and error scores. In relation to the DASH this is outweighed by the gains in practicality and internal consistency, but not in comparison to the ULFI. These findings are reflected in the summary scores of the 'Measurement of Outcome Measures' and the 'Bot scale' that supports the preference of the QuickDASH-9 over the DASH. However, both tools remained notably lower than the ULFI on both scales which scores as the preferred instrument for both clinical and research purposes due to its practicality and lower missing responses.

Limitations

The study investigated only outpatients presenting to primary care physical therapy practices and further research is required to clarify these findings in an inpatient setting. The findings are general and extrapolation to specific conditions must be made with caution till such conditions are individually investigated. There was a consistent difference in the QuickDASH-9 order of factor loading between the calibration and validation stages. This is most likely from differences in the samples due to the diverse range of diagnoses and duration times used in each stage.

Strengths

The findings have broad implications for use in the general population as they are not specific to one condition or population group as participants were from general outpatient populations. Two independent population samples are used for data extraction to examine the QuickDASH-9 characteristics. The use of a consistent reference criterion, the ULFI, supports the similarity of findings in the two samples.

Implications for Practice

The QuickDASH-9 as a valid shortened form of the DASH provides a practical approach to measurement of the upper limb. This enhanced practicality reduces the burden to both the patient and clinician, optimizing clinical practice without compromising the accuracy and error measurement capacity of the instrument.

Implications for Research

A prospective validation of the QuickDASH-9 is required in an independent sample using an established criterion, such as the ULFI. Further investigation of the psychometric properties in samples of specific populations and conditions is also required. This could initially be investigative through extraction of responses from existing DASH and QuickDASH studies, with prospective investigation to follow. However, with the summary performance of all forms of the DASH concept shown to be lower than the ULFI, the adoption of the ULFI as a single preferred standard may be preferable.

Conclusions

The unidimensional structure found in the proposed QuickDASH-9 is valid and consistent with the full-length DASH. This achieves the original aim of the QuickDASH, to be a shortened version with comparable or preferable psychometric properties, no item redundancy and higher practicality. The QuickDASH, with a bidimensional structure, is invalid for the production of a summated score. This shortcoming is overcome by the QuickDASH-9. Furthermore, the QuickDASH-9 eliminates item redundancy found in the DASH, improves internal consistency, completion and scoring times and enhances practicality. The QuickDASH-9 offers a viable future option for the DASH concept.