Introduction

Chronic hand conditions often lead to life-long impairments, such as pain, muscle weakness, spasticity, and joint and/or muscle contractures [1,2,3]. These impairments limit a person’s ability to perform activities of daily living [3,4,5], and negatively impact quality of life [4, 6]. Hand orthoses are applied to reduce hand impairments, and to improve performing daily activities [7,8,9,10]. Annually, about 27,400 upper extremity orthoses are provided in the Netherlands [11], of which a large proportion comprises hand orthoses.

Most people with chronic hand conditions usually wear a custom-fabricated orthosis on a daily basis, which emphasizes the importance to assess a person’s experiences with the quality of care. A relevant patient-reported outcome regarding the quality of care is orthosis satisfaction, which, over the past years, has increasingly received attention in the field of orthotics. However, only two of the questionnaires evaluating orthosis satisfaction showed adequate construct validity for upper extremity orthoses [12], namely the Client Satisfaction with Device (CSD) module [13, 14] and the Quebec User Evaluation of Satisfaction with assistive Technology (QUEST) 2.0 [15, 16].

The CSD is one of the five modules of the Orthotics and Prosthetics Users’ Survey (OPUS), assessing persons’ satisfaction with their orthotic or prosthetic device [17]. The CSD has shown good internal consistency and test-retest reliability in spine, and upper and lower extremity orthotic and prosthetic users [14, 18,19,20,21,22,23], and moderate test-retest reliability in lower extremity prosthetic users [24]. The QUEST 2.0 evaluates satisfaction within a wide range of assistive devices and consists of a device and services subscale. The device subscale has shown good internal consistency and test-retest reliability [25]. The QUEST 2.0 has been translated and validated in Dutch (D-QUEST) [15].

The CSD and (D-)QUEST 2.0 have only three aspects in common: weight, comfort and durability. Furthermore, the QUEST was developed as a generic tool for a variety of assistive devices, while the CSD was specifically developed for orthotic and prosthetic devices, addressing for example fitting, skin reactions and aesthetics. Therefore, the CSD and QUEST complement each other when evaluating satisfaction with hand orthoses.

The CSD has originally been developed in English [13, 17], and was translated and validated into various different languages [14, 18,19,20,21,22,23, 26, 27], however so far not in Dutch. Therefore, the aim of this study was to (1) translate and cross-culturally adapt the CSD into the Dutch language, and (2) assess the content validity, structural validity and reliability of the Dutch CSD (D-CSD) in chronic hand orthotic users. Considering the good psychometric properties of the other CSD translations [14, 18,19,20,21,22,23], we hypothesized that the D-CSD would be a valid and reliable tool to measure orthosis satisfaction in our population.

Methods

Study design

This cross-sectional study was conducted according to the COnsensus-based Standards for the selection of health Measurement Instruments (COSMIN) Study Design checklist and reported following the COSMIN reporting guideline [28, 29]. First, we translated and cross-culturally adapted the CSD into the Dutch language, and subsequently, we evaluated the content validity of the D-CSD. Second, we assessed the structural validity and reliability of the D-CSD in chronic hand orthotic users.

Translation and assessment of content validity

Questionnaire

The CSD is a self-administered questionnaire that originally consists of 11 items [17]. Since two items (both dealing with the costs of the devices) did not fit the construct, and these items are irrelevant to the Dutch population because orthotics are reimbursed by the health insurance, we used the 9-item version of the CSD as proposed by Jarl et al [26]. Items were rated on a 5-point Likert scale, ranging from 0=‘strongly disagree’ to 4=‘strongly agree’. The response option ‘Don’t know/Not applicable’ was not scored numerically. The sum score was calculated using the following formula:

$$(\sum \text{i}\text{t}\text{e}\text{m}\ \text{s}\text{c}\text{o}\text{r}\text{e}\text{s})\times \frac{\text{t}\text{o}\text{t}\text{a}\text{l}\ \text{n}\text{o}. \text{o}\text{f}\ \text{i}\text{t}\text{e}\text{m}\text{s}}{\text{n}\text{o}. \text{o}\text{f}\ \text{i}\text{t}\text{e}\text{m}\text{s}\ \text{s}\text{c}\text{o}\text{r}\text{e}\text{d}\ \text{n}\text{u}\text{m}\text{e}\text{r}\text{i}\text{c}\text{a}\text{l}\text{l}\text{y}}$$

Translation procedure

The CSD was translated in accordance with the guidelines for translation and cross-cultural adaptation [30]. Two native Dutch speakers (TO and RdJ) independently translated the CSD from English into Dutch. TO was aware of the examined concepts, while RdJ was not aware of the concepts, and had no medical background. Both translators and an independent person (MB) discussed differences between the two translations to reach consensus. Two native English speakers without medical background (AN and JN) independently translated the consensus version from Dutch into English. To avoid information bias, both translators were unaware of the concepts explored. Taking all translations into consideration, an expert committee (i.e. all translators, a methodologist and health care professionals) consensually agreed on a pre-final version of the D-CSD. The developer of the CSD was asked for feedback, and relevant adjustments were made to finalize the D-CSD. Finally, the D-CSD was pretested by determining the content validity in chronic hand orthotic users. Content validity is ‘the degree to which the content of an instrument is an adequate reflection of the construct to be measured’ [31].

Study population

According to the COSMIN guidelines, at least 7 participants are needed to determine content validity [28]. We included 10 chronic hand orthotic users, who were recruited from our outpatient rehabilitation clinic database at Amsterdam UMC, location Academic Medical Center. Eligible participants were characterized as (1) being ≥ 18 years, (2) having a stable, chronic hand condition due to an injury, or a musculoskeletal, neuromuscular or neurological disorder, and (3) permanently wearing a thumb, wrist, or wrist-thumb orthosis for ≥ 3 months, custom-fabricated by an orthopedic company (OIM Orthopedie, the Netherlands). Participants were excluded when they wore (1) an orthosis for a dysfunctional hand, (2) a broken orthosis, (3) a night orthosis, and (4) had insufficient mastery of the Dutch language.

Procedure content validity

After signing informed consent, demographics and clinical characteristics of the participants were obtained. Subsequently, the content validity of the D-CSD was evaluated by cognitive debriefing. Content validity was judged based on 10 criteria related to relevance, comprehensiveness and comprehensibility (Table 1) [32]. We did not assess criterion 5 (appropriateness of the recall period) since the D-CSD aims to evaluate orthosis satisfaction at present time. Cognitive debriefing was performed by TO and JT using a Three-Step-Test-Interview [33]. Interviews were video-recorded with Microsoft TEAMS. TO and JT additionally judged the relevance, comprehensiveness and comprehensibility of the D-CSD, except for criteria 7 and 8.

Table 1 Ten criteria for rating content validity

Data analysis

Socio-demographics and clinical characteristics of participants were summarized with descriptive statistics. The interviews were transcribed verbatim using MAXQDA 2022. For each participant, information on each criterion was highlighted and recoded into a positive (i.e. agreed with criterion) or negative (i.e. disagreed with criterion) score by two researchers. Each criterion was than rated sufficient (≥ 85% of the participants agreed with the criterion) or insufficient (< 85% of the participants agreed with the criterion) [32, 34].

If the analysis of comprehensiveness pointed out that a certain aspect of orthosis satisfaction was missed, as reported by ≥ 25% of the participants, we formulated an additional item. Thereafter, the comprehensibility and relevance of this item were assessed in three other participants. Subsequently, a rating for relevance, comprehensiveness, and comprehensibility was determined by summarizing the criteria ratings given by the participants and professionals for each component. Content validity was judged sufficient if all three components were rated positive [32, 34].

Assessment of structural validity and reliability

Study population

According to the COSMIN guidelines, assessing structural validity requires a sample size of at least 6 times the number of items to obtain sufficient statistical power, and for an adequate assessment of the reliability, a sample size of 50–99 is advised [28]. The used CSD contains 9 items, and therefore, we aimed for 70 participants. Participants were recruited from the database of OIM Orthopedie, supplemented with participants of the feasibility study on 3D-printed hand orthoses [35]. The same in- and exclusion criteria were held as outlined earlier.

Procedure

After obtaining informed consent, the investigator collected demographical and clinical data of the participants. Subsequently, the D-CSD was sent digitally using Castor (Castor EDC, Amsterdam, the Netherlands) or by post (T1). The questionnaire was sent a second time two weeks after the first questionnaire was completed (T2). If necessary, a reminder was sent after one week.

Data analysis

Socio-demographics, clinical characteristics and mean (SD) D-CSD scores at T1 and T2 were summarized with descriptive statistics. Further, floor and ceiling effects were examined, which were defined as being present if at least 15% of participants reached the lowest or highest possible score, respectively [36].

Structural validity

Structural validity, an aspect of construct validity, is defined as ‘the degree to which the scores of a measurement instrument are an adequate reflection of the dimensionality of the construct to be measured’ [31]. The CSD was designed as a unidimensional construct. To determine the dimensionality of the D-CSD, a principal component analysis (PCA) was performed. To assess whether PCA was appropriate for the present data set, the Kaiser-Meyer-Olkin (KMO) value of sampling adequacy (threshold > 0.70), Bartlett’s value of Sphericity (threshold p < 0.05), and the determinant of correlation matrix were determined (threshold > 0.00001). The number of meaningful factors was determined with Horn parallel analysis (HPA) [37]. Thereafter, a confirmatory factor analysis (CFA) with Weighted Least Squares with Mean and Variance adjustment estimation was performed, to assess the fit of the factor model estimated by PCA. Sufficient evidence was considered for the determined dimensionality and thus a good model fit when the following criteria were met; (1) Comparative Fit Index (CFI) > 0.95, (2) Tucker-Lewis Index (TLI) > 0.95, (3) root mean square error of approximation (RMSEA) < 0.06, and (4) standardized root mean residuals (SRMR) < 0.08 [38].

Reliability

Reliability, ‘the degree to which the measurement is free from measurement error’, was determined by the measurement properties internal consistency, test-retest reliability, and measurement error [31]. For internal consistency (i.e. the degree of inter-relatedness among the items), Cronbach’s alpha was calculated. A Cronbach’s alpha ≥ 0.70 was considered to reflect good internal consistency [38]. To investigate test-retest reliability, the intra-class coefficient (ICC) and its 95% confidence interval (CI) were calculated using a two-way mixed effects model for a single measurement. Test-retest reliability was considered poor, moderate, good or excellent if the 95% CI of the ICC was less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, or greater than 0.90, respectively [39]. Systematic differences between test scores on the two occasions (đ) and the 95% CI were analyzed with paired-samples t-tests (for normally distributed outcomes) or Wilcoxon signed-rank tests (for non-normally distributed outcomes). To evaluate measurement error, a Bland-Altman plot was constructed and the 95% limits of agreement (LoA) were calculated (đ ± 1.96 × SD over the differences between test occasions) [40]. Also, the standard error of measurement (SEM) and smallest detectable change (SDC) were calculated. The SEMagreement, representing the limits for the smallest change that indicates a real change for a group of individuals, was calculated as √(variance occasions + variance error) [41, 42]. The SDC, indicating the amount of change at individual level that is real and not due to a potential measurement error, was determined using the formula: 1.96×SEM×√2 [42].

Statistical analyses were performed using R statistics’ version 4.0.3, packages psych, lavaan, BlandAltmanLeh and ggplot2 (R foundation for Statistical Computing, Vienna, Austria). P < 0.05 was considered significant. Missing values were not imputed. Since the response option ‘Don’t know/Not applicable’ was not scored numerically and therefore marked as missing, available case analysis was used in the factor analyses and Cronbach’s alpha calculation.

Results

Translation and content validity

Translation

The forward and backward translations showed minor variations in wording. Specifically, the expert committee questioned whether the Dutch translation of ‘durable’ in item 6 would be interpreted as sustainable. For item 2, another Dutch word for ‘manageable’ was chosen (in Dutch ‘te hanteren’). Translations of the response options were deemed clear. After reaching consensus, the developer of the CSD made two remarks on the pre-final version: (1) in the original CSD, negation was avoided to minimize response confusion. However, in the D-CSD, we choose to phrase a negative question for item 7, since the sentence would become too long and complex in Dutch, and (2) in line with the expert committee, the developer questioned the Dutch translation of ‘durable’ of item 6 since one backward translation was ‘sustainable’, which might be understood as being made of renewable materials. To prevent confusion, we added a synonym (wear-resistant) to this item in the final version of D-CSD.

Content validity

The mean (SD) age of the ten participants (9 females) was 58.7 (9.1) years (Table 2). Nine participants were native Dutch, and one participant was native English. The level of education ranged from lower vocational education to university.

All ten participants (100%) and the professionals indicated that the D-CSD items were relevant for the construct of interest, target population and context of use. Furthermore, the response options were considered appropriate.

Although two participants questioned one item (i.e. item 1 and item 7), ≥ 85% of the participants agreed that the instruction, items and response options were comprehensible. Also, the professionals agreed that the items were appropriately worded and that the response options matched the items.

Three out of ten participants (30%) missed an item about cleaning the orthosis. The professionals agreed on the relevance of this item since the orthosis is generally worn daily. Therefore, the item ‘My prosthesis/orthosis is easy to clean’ was formulated and positively judged on relevance and comprehensibility by three other participants. By adding this item to the final D-CSD, thus including 10 items (score range from 0 to 40 points, with a higher score indicating a higher satisfaction), the comprehensiveness was rated sufficient.

Since the relevance, comprehensibility, and comprehensiveness were all sufficient, the content validity of the D-CSD was judged sufficient.

Structural validity and reliability

We invited 425 people from the OIM Orthopedie database, of whom 85 persons were interested to participate. Fifty-five persons met the in- and exclusion criteria. Combined with 21 participants of the feasibility study, 76 participants were included (Fig. 1). Their demographics and clinical characteristics are presented in Table 2. Because the period between T1 and T2 of one participant was 3 months and no T2 data was received from another participant, these two participants were excluded from the test-retest reliability analysis. Based on 74 participants, the mean (SD) D-CSD score at T1 and T2 was 26.8 points (6.47) and 25.9 points (6.37), respectively. No floor or ceiling effects were observed, since, respectively, none of the participants obtained the lowest possible score, and only 5% of the participants obtained the highest possible score on T1 and none on T2.

Fig. 1
figure 1

Participant flow chart

Table 2 Demographic and clinical characteristics of the participants

Structural validity

The KMO indicated good sampling adequacy (KMO = 0.82), Bartlett’s test was significant (p < 0.001), and the correlation matrix determinant was 0.069. HPA indicated the presence of one factor. PCA showed item loadings ranging from 0.44 to 0.72 (Table 3). The factor explained 39% of the variance. CFA demonstrated a good one-factor model fit, with all fit indices above the reference criteria (CFI = 1.00, TLI = 1.03, RMSEA < 0.001, SRMR = 0.06).

Table 3 Factor loadings of D-CSD items

Reliability

Cronbach’s alpha was 0.82 (95%CI 0.75–0.87), indicating good internal consistency. Dropping an item did not improve the internal consistency (range with one item deleted: 0.78–0.82). There was no significant difference between the two occasions (mean (SD) difference: 0.86 points (4.00); 95%CI -0.06-1.79; p = 0.07). The ICC was 0.81 (95%CI 0.71–0.87), indicating moderate to good test-retest reliability. The Bland-Altman plot, including the 95% LoA (-6.99 to 8.71), is shown in Fig. 2. The SEM and SDC were 2.88 and 7.98 points, respectively.

Fig. 2
figure 2

Bland-Altman plot. The green line indicates the mean difference between T1 and T2. Red lines indicate the 95% limits of agreement

DISCUSSION

The results of this study showed sufficient content and structural validity of the D-CSD in our sample of chronic users of hand orthoses. Further, good internal consistency, moderate to good test-retest reliability and acceptable measurement error were found. Sensitivity to detect changes on individual level was limited.

With regard to content validity, the components relevance, comprehensibility and comprehensiveness were all rated sufficient. Although one item was knowingly negatively phrased, comprehensibility was not adversely affected. Some participants indicated missing an item on cleaning of the orthosis, which was therefore included in the final version of the D-CSD. Including this item did not affect the internal consistency, since the Cronbach’s alpha remained constant when this item was dropped. Although PCA showed that this item had the lowest factor loading (0.44), it was above the cut-off value of 0.40, and thus considered acceptably correlated to the construct measured [43]. A recent scoping review on the psychometric properties of the CSD suggested to discard item 7 on ‘wear and tear on clothing’ [44], as it did not fit the model in Rasch analysis in three studies in lower and upper limb orthotic and prosthetic users [13, 17, 26]. Based on our clinical experience, clothing wear and tear is a well-known problem in the target population, which was also indicated by participants during the cognitive debriefing. Furthermore, our PCA showed adequate factor loading of this item, and dropping it would have lowered the internal consistency, supporting our decision to retain this item in the D-CSD.

Regarding structural validity, and in line with our hypothesis, the results of the PCA indicated a one-factor model, consistent with the English 9-item CSD, and which has also been demonstrated in five other studies in persons with spine, and upper and lower extremity orthotics and prosthetics [14, 18, 21,22,23, 26]. The explained variance of 39% is within the range of 37–88% reported in these studies. We furthermore confirmed the unidimensionality of the D-CSD as the CFA resulted in adequate one-factor model fit indices. No other studies have been conducted using CFA in assessing the structural validity of the CSD to compare our results with.

The D-CSD showed good internal consistency, which is in line with our hypothesis and comparable to earlier findings in persons with spine, and upper and lower extremity orthotics and prosthetics [14, 18, 19, 21,22,23,23]. Furthermore, the D-CSD showed moderate to good test-retest reliability, indicating that the D-CSD can adequately distinguish persons with high and low orthosis satisfaction scores. The ICC found in our study is comparable with studies on the Persian, Swedish and Turkish CSD [19, 20, 23], and much higher than the reported ICC of 0.50 for the English CSD [24]. Reliability in this latter study however, was assessed in a sample of veterans wearing unilateral lower limb prostheses, who, compared to our sample of chronic hand orthotic users with a variety of diagnoses and impairments, might represent a more homogeneous group. Probably, this resulted in less between-subject variance, thereby lowering the ICC. Overall, it should be noted that previous studies examining the psychometric properties of the CSD, including reliability, used different populations, sample sizes, and CSD-versions (i.e. number of items, response options and scoring systems), which limits a fair comparison with our results.

Despite good test-retest reliability and an acceptable SEM of 2.88 points (11% of the pooled mean D-CSD score), the SDC was relatively high, i.e. an individual needs to change > 7.98 points (30% of the pooled mean D-CSD score) to ensure the detection of a true change. Although different populations, sum scores and SEM calculations were used, this is within the range of 16–34% of the mean CSD score reported in earlier studies [19, 20, 23, 24]. Ideally, the SDC should not exceed the minimal important change (MIC), a threshold for a minimal within-person change over time above which persons perceive themselves importantly changed [38]. Unfortunately, no research has been performed on the MIC of the CSD. As a rule of thumb, it has been suggested that the MIC can be estimated as 10% of the maximum score of a measurement [45]. In our study, this would result in a MIC of 4 points, which is far below our SDC of 7.98 points, indicating limited applicability of the CSD to detect importantly changes in orthosis satisfaction on individual level. For clinical practice, in order to detect smaller changes or changes below the MIC, the outcome measure requires a smaller SDC. This can be achieved by increasing the number of measurements to overcome the problem of large measurement error [42]. Future research should focus on assessing the effect of using multiple repeated measurements over time on the SDC in so-called G-studies and D-studies [46], and on determining the MIC of the CSD in chronic hand orthotic users to compare these two outcomes adequately. Furthermore, as we investigated the validity and reliability of the D-CSD in hand orthotic users, yet the CSD also targets lower extremity orthotic users and upper and lower extremity prosthetic users. Future studies are needed to investigate the psychometric properties of the D-CSD in these populations.

Strengths and limitations

A strength of our study was the specific attention given to the content validity. This type of validity is considered the most important measurement property, indicating whether questionnaire items are relevant, comprehensive, and comprehensible with respect to the construct of interest and study population [32], which was shown in our study. Furthermore, since this study was conducted in a heterogeneous sample (i.e. diversity of diagnoses) of chronic hand orthotic users, wearing the three most commonly prescribed types of hand orthoses, we are confident that the results can be generalized to the population of chronic hand orthotic users at large.

Our study also has some limitations. Although we invited 425 persons to participate in our study, no more than 85 persons were interested to participate. Due to this low response rate (20%), combined with the 21 participants specifically willing to participate in our feasibility study on 3D-printed orthoses, selection bias could have occurred. Besides, a higher sample size, ideally ≥ 100 participants [28], could have resulted in higher precision of the validity and reliability estimates.

CONCLUSION

We showed sufficient content validity and structural validity, and good reliability of the D-CSD in Dutch chronic hand orthotic users. Given the relatively high SDC, sensitivity to detect changes in orthosis satisfaction over time on individual level is limited. Yet, based on the SEM, the D-CSD is considered a useful tool to assess satisfaction of hand orthoses on group level in this population.