Background

Functioning and disability are increasingly recognized as relevant outcomes of studies on patients with chronic health conditions [1]. The World Health Organization Disability Assessment Schedule 2.0 (WHODAS 2.0) has been developed to facilitate a standardized assessment of the consequences of any kind of disease that has an impact on individuals' functioning and disability [2],[3]. The questionnaire is conceptually based on the International Classification of Functioning, Disability and Health (ICF) [4] which addresses functioning and disability as concepts independent from medical diagnoses and neutral in respect of its etiology. Reflecting a biopsychosocial perspective of functioning and disability, the ICF considers impairments of body structures and body functions, limitations in activities and restrictions in participation, as well as influencing contextual factors such as personal and environmental factors.

The WHODAS 2.0 questionnaire is available in different forms depending on the number of items (6, 12, 24, 12 + 24, and 36 items), the mode of administration (self-administered or interview), and the respondent (subject, clinician, caregiver) [5]. The 36-item WHODAS 2.0 contains six domains: understanding and communication, getting around, self-care, getting along with people, life activities, and participation in society. It has been tested for its psychometric properties in a number of studies with population samples [6] and persons with different health conditions [7]-[10].

The 12-item WHODAS 2.0 covers all six domains of the 36-item form. Two different versions of the 12-item WHODAS 2.0 exist, the original version [6] and a revised version with five items replaced. The original version was reported to correlate highly (r = 0.95) with the 36-item form and to explain more than 90% of the variation of the 36-item WHODAS 2.0 [6], while the currently used revised version was reported to explain 81% of that variance (http://www.who.int/classifications/icf/whodasii/en/index3.html). It takes about five minutes to complete the 12-item form. Thus, this form is expected to be a feasible screening instrument specifically for population surveys. However, an investigation of the feasibility of the WHODAS 2.0 self-administered 12-item form is lacking. Moreover, in contrast to the 36-item WHODAS, for the 12-item form studies on its psychometric properties are scarce so far. The available studies are limited to the general population [11], older people [12], and patients with depression [13]-[15].

Persons with coronary heart diseases (CHD) and its complications such as acute myocardial infarction (AMI) are at risk of developing disability in the long-term course of the disease [16],[17]. Available studies that investigate functioning and disability associated with CHD often had a restricted assessment approach. For instance, the physical function subscale of the Short Form 36 Health Survey (SF-36) [18],[19], the EuroQol 5D (EQ5D) [18], single questions selected from the Functional Health Scale [17],[20],[21], or self-developed questions [16],[22] were applied to assess disability in persons with AMI. Thus, it would be useful to have a standardized, internationally accepted instrument covering disability from the biopsychosocial perspective to provide a feasible, reliable, and valid measurement of disability in patients with AMI.

Consequently, the objective of this study was to investigate feasibility and psychometric properties of the 12-item self-administered WHODAS 2.0 in a large population-based sample of persons with AMI.

Methods

In order to examine feasibility and psychometric properties of the 12-item self-administered WHODAS 2.0 in persons with AMI we carried out multivariate logistic regression modeling and Rasch analysis using data of the population-based Augsburg Myocardial Infarction Registry. The registry was implemented in 1984 as part of the WHO-MONICA (Moni toring Trends and Determinants in Ca rdiovascular Disease) project [23]. After the termination of MONICA in 1995, the registry became part of the framework of KORA (Cooperative Health Research in the Region of Augsburg, Germany). Since 1984, all cases of coronary deaths and non-fatal AMI of the 25-74 year old study population in the city of Augsburg and the two adjacent counties (about 600,000 inhabitants) have been continuously registered. Data sources for hospitalized patients include eight hospitals within the study region and two hospitals in the adjacent areas. Approximately 80% of all AMI cases of the study region are treated in the study region's major hospital, Klinikum Augsburg, a tertiary care centre offering invasive and interventional cardiovascular procedures, as well as heart surgery facilities [23],[24]. Methods of case finding, diagnostic classification of events, and data quality control have been described elsewhere [23],[24]. Routinely, patients are interviewed during their hospital stay by trained study nurses after transfer from the intensive care unit using a standardized questionnaire. The interviews include demographic data, risk factors, and comorbidities. Further data on clinical variables, comorbidities, treatment, and in-hospital course are determined by chart review.

The study was approved by the Ethics committee of the Bavarian chamber of physicians and performed in accordance with the Declaration of Helsinki. Participants gave written informed consent prior to study inclusion.

Sample

The target sample consisted of all 3740 patients with AMI included in the MONICA/KORA Myocardial Infarction Registry, Augsburg, Germany, in the years 2000 to 2008 who were alive on 1 July 2011. Of these, 1266 persons have previously declined further participation. A postal questionnaire was sent to the remaining 2474 persons, including questions on the current health status, comorbidities, medication, and health care, as well as the German version of the 12-item WHODAS 2.0 (revised version). Reminders were sent to 1194 persons who have not responded by September 2011. Persons who still failed to respond were reminded by telephone. Thirty persons could not be reached because they had died, 63 declined their participation, 38 were not available, three were not known at the available address, and 243 could not be reached by telephone for other reasons (e.g., no telephone connection, not reachable). The final sample consisted of 2077 men and women aged 35-85 years with first or recurrent AMI who responded to the questionnaire. Compared with the persons who could not participate for any reason, this sample had a similar distribution of sexes (22.2% versus 23.0% women), but a slightly higher mean age (66.6 ± 9.5 years versus 64.0 ± 11.9 years).

Data used for analyses

The following data and measures were used for the data analyses:

  1. (1)

    Data obtained from patient interview and/or chart review during the hospital stay, namely sex, age at infarction, education according to the German school system (dichotomized into < =9 years vs. >9 years school education), marital status (married vs. not married), history of hypertension, diabetes, angina pectoris, hyperlipidemia, smoking (current smoker, ex-smoker, never smoker), previous AMI, AMI type (ST-segment elevation MI, non-ST-segment elevation MI, bundle branch block), and reperfusion treatment (thrombolysis, bypass surgery, percutaneous coronary intervention with or without stenting, no reperfusion therapy).

  2. (2)

    Data of a postal follow-up questionnaire requesting information on current age and re-infarction and diabetes. Comorbidities were assessed using a modified version of the Self-administered Comorbidity Questionnaire [25], which requests information on the presence of 13 chronic health conditions. In addition, patients were asked to name any other diseases they had. A variable reflecting the sum of all named comorbidities was built and dichotomized into "no comorbidities" and "at least one comorbidity". Furthermore, overall health status was measured by the question "How do you rate your general health state in the past 30 days?" Response options were "very good", "good", "moderate", "bad", and "very bad". For the data analyses, the response options "very good" and "good", as well as "bad" and "very bad", were collapsed.

  3. (3)

    WHODAS 2.0 12-item self-administration version as additional part of the postal questionnaire. For each item, respondents had to indicate the level of difficulty experienced during the previous 30 days using a five-point scale (none, mild, moderate, severe, extreme/cannot do). According to the standard scoring algorithm, a total score was calculated for persons who completed at least 10 of the 12 questions by summing up all items, while up to two missing items were replaced by the mean score of the remaining items [5].

Data analysis

Descriptive analyses were carried out to describe the sample in terms of socio-demographic and clinical characteristics. Absolute and relative frequencies of the responses to each WHODAS 2.0 item were calculated.

Feasibility

Feasibility of the WHODAS 2.0 was determined by the number of missing items. Associations of incomplete WHODAS 2.0 questionnaires (1 to 12 missing items) with age, sex, education, marital status, presence of comorbidities and overall health status rating were first illustrated by descriptive statistics and tested using Chi2-test and then analyzed applying multivariate logistic regression modeling. Odds ratios (OR) and their 95% confidence intervals (CI) were reported in order to describe differences between the persons who had not or not fully completed the WHODAS 2.0 and those who returned a questionnaire without missing items. Analyses were performed using SAS version 9.2 (SAS Institute Inc., Cary, North Carolina).

Psychometric properties

Psychometric properties were examined by using Rasch analysis [26]. We only included those patients for whom a 12-item sum score could be calculated [5]. Therefore, the data of 1995 patients were considered for further analysis. Rasch models assume an underlying latent trait, which in the case of the WHODAS 2.0 is disability. On this trait both item difficulty and person ability are located [26]. We chose a Partial Credit Model (PCM) (also called Polytomous Rasch Model) due to our set of ordinal, polytomous items.

Rasch analysis was performed in the following steps:

(1) Testing of model assumptions: We tested the model assumptions unidimensionality, monotonicity, and local independency.

Unidimensionality was examined using bifactor analysis on the polychoric correlation matrix [27],[28]. Within bifactor analysis the existence of one general factor and multiple independent group factors are presumed. High loadings on the general factor exceeding those of the group factors for all items indicate an underlying unidimensional latent trait. The number of factors considered in the bifactor analysis was determined by permuted parallel analysis [29].

Monotonicity was explored for each item by reviewing graphs of the item's distribution conditional on average "rest-scores". These scores were calculated for each item as the total raw score of all the remaining non-missing items divided by their number. If there is a consistent trend that persons with higher rest-scores are more likely to have more problems in the given item, then monotonicity can be assumed.

Local independency was examined based on the residual correlations among items resulting from a single-factor analysis [30]. High residual correlations suggest that the response to one question influences the response to another.

(2) Computing and fitting of the Rasch model: After evaluating the model assumptions the PCM was fitted. When calculating a PCM the item location, or the overall item difficulty, is provided for every item [31]. Furthermore, item thresholds for each item are computed, indicating the location on the latent trait where the item best discriminates between persons [31]-[34]. In case of unordered thresholds the response options had to be collapsed until they were in the correct order before proceeding with tests on the model. Should collapsing be necessary, we decided to collapse all items identically for a better comparability among the items.

Finally, we examined item fit based on (unweighted) outfit and (weighted) infit mean squares and created graphics to be able to better interpret these measures [26]. Both mean squares are interpreted in the same way: 1) mean squares close to 1 indicate good item fit; 2) mean squares much larger than one indicate underfit (i.e., the observed data varies much more than can be explained by the model - which constitutes a severe violation of the model); 3) mean squares much smaller than 1 indicate overfit (i.e., the data varies much less than would be expected based on the model - which is usually accepted). Different cut-offs for identifying too large and too small mean squares can be found in the literature [26],[35],[36]. Usually, values between 0.70 and 1.3 are considered reasonable, but the cut-offs depend on the sample size, the number of items, and their number of response options. Therefore, to be able to better judge on item fit, we created graphics comparing 1) expected probabilities for responding above a certain threshold with 2) the observed response frequencies for groups of persons with close ability estimates.

Rasch analysis was performed using R software [37] and computed with the R package eRm [38].

(3) Testing for Differential Item Functioning (DIF): We tested DIF for sex, age (above or below 65), education, marital status, presence of comorbidities, overall health status, and smoking status. Due to the large sample size, change in McFadden's pseudo R2 (r < = 0.02) was chosen as the criterion for flagging. If items show DIF they are a potential cause for bias in person measurement. This is the case if different groups (e.g., older and younger patients) respond in a different way despite equal levels of the underlying characteristic being measured [31]. For testing DIF we used the R package lordif [39].

(4) Assessment of concurrent validity: For the final Rasch model person ability was transformed into a score ranging from 0 to 100 (with 0 corresponding to perfect functioning/no disability) in order to facilitate the interpretation of group differences. A linear additive model [40] was estimated predicting the value of this disability score based on sex, age, education, marital status, presence of comorbidities, and overall health status as independent variables.

Age was modeled in a flexible, non-parametric way using P-splines. Concurrent validity can be assumed if persons, for example, with comorbidities or worse overall health status have a higher expected level of disability (i.e., higher score values) compared to those without comorbidities and those rating their health as good or very good. For estimating the linear additive model we used the R package mgcv [40].

Results

Socio-demographic and clinical characteristics of the sample are presented in Table 1. The absolute and relative frequencies of the WHODAS 2.0 items can be found in Table 2.

Table 1 Sample characteristics
Table 2 Frequencies of the response options in the sample considered for Rasch analysis (n = 1995)

Feasibility

From the 2077 respondents, 2055 (98.9%) completed at least one item from the WHODAS 2.0. Most patients (n = 1802, 86.8%) answered all questions, 158 patients (7.6%) left one question blank. Two or three items were missing for 53 patients (2.6%), and 42 patients (2.0%) had four to 11 missing items. Items which were most frequently not answered were "Learning a new task" (n = 78, 3.8%), "Household responsibilities" (n = 74, 3.6%), and "Community activities" (n = 74, 3.6%). The association between age, sex, education, marital status, presence of comorbidities, self-rated health status, and the completeness of the WHODAS 2.0 was examined in a bivariate analysis. Persons who did not complete one to 12 items of the WHODAS 2.0 were compared to those who completed all items. Figure 1 shows that incomplete WHODAS 2.0 questionnaires were significantly more often found in women, older persons, and persons with poor education or bad health status.

Figure 1
figure 1

Percentage of persons with incomplete WHODAS 2.0 questionnaire (one to 12 missing items) in the study population (n = 2077). P-values refer to Chi2-test for independence of frequencies in different strata.

Further, these six variables were tested for their association with completeness of WHODAS 2.0 questionnaires in multivariate logistic regression models. No interaction effect between current age and sex was found. Table 3 shows the results of the full model including all six independent variables. Older persons and persons with bad/very bad health status were more likely to have missing WHODAS 2.0 items compared with younger persons or very good/good health status, respectively.

Table 3 Factors associated with incomplete WHODAS 2.0 questionnaires (one-12 items missing): results of multivariate logistic regression analyses

Psychometric properties

(1) Model assumptions: All items showed high loadings on the general factor (mean: 0.82; range: 0.67-0.95, percentage of variance accounted for by the general factor: 67.3%) and loaded higher on the general factor compared to the group factors supporting the assumption of unidimensionality. Monotonicity was graphically confirmed. A general trend of persons with higher rest-scores showing more difficulties in the respective item was found. We calculated residual correlations among all items based on a single-factor analysis in order to check for possible local dependency. The residual correlations were r < =0.2 for all but two correlations (r = 0.31 for "Maintaining friendship" and "Dealing with unknown people," and r = 0.23 for "Concentrating" and "Dealing with unknown people").

(2) Computing and fitting of the PCM: The PCM calculated with the original variables revealed unordered thresholds for all but three items ("Concentrating", 'standing', "Being emotionally affected"). Therefore, all items were collapsed based on the collapsing strategy 01122 (0="None;" 1="Mild" and "Moderate;" 2='severe' and "Extreme/cannot do") and the model re-estimated. The resulting distribution of person abilities, item difficulties, and item thresholds are presented in Figure 2.

Figure 2
figure 2

Person-item-map.

Table 4 additionally contains the outfit and infit mean squares. Most of them are very close to one, while five items show slight to moderate overfit and one item ("Learning a new task") shows slight underfit. Figure 3, however, visualizes that the observed frequencies are very close to the expected probabilities for all items, even for "Learning a new task". If there are larger differences, the curve of observed frequencies is steeper, which corresponds to the definition of overfit.

Table 4 Item locations, item thresholds, and outfit and infit mean squares
Figure 3
figure 3

Graphical assessment of item fit: Comparison of 1) expected probabilities for responding above the threshold based on the PCM (red line) and 2) observed response frequencies for groups of persons with close ability estimates ("x"s connected by dotted black line). If the observed frequencies rely on more than 20 persons, the x is drawn in black, while for smaller groups the x is grey.

(3) Testing for DIF: No DIF was detected for any of the variables (sex, age, education, marital status, presence of comorbidities, overall health status, and smoking status) in the PCM with the collapsed response options.

(4) Assessment of concurrent validity: For the PCM with the collapsed response options person ability was transformed into a score ranging from 0 to 100 (with 0 corresponding to perfect functioning/no disability). The results from the linear additive model predicting the value of this disability score based on sex, age, education, marital status, presence of comorbidities, and overall health status as independent variables are presented in Table 5 and Figure 4. Table 5 shows that persons with comorbidities or worse overall health status (i.e., higher score values) had worse functioning/more disability compared to those without comorbidities and rated their health as good or very good, indicating high concurrent validity.

Table 5 Results on concurrent validity: the linear additive model
Figure 4
figure 4

Results on concurrent validity: Nonlinear effect of age (solid line) resulting from the linear additive model and 95% credible intervals (dashed lines).

Figure 4 shows the nonlinear effect of age (solid line) resulting from the linear additive model and 95% credible intervals (dashed lines). For age the effect is almost constant up to an age of 68, after which increasingly higher score values are expected, i.e., more disability.

Discussion

Our study investigated feasibility and psychometric properties of the revised, currently used 12-item self-administered WHODAS 2.0 in a population-based sample of 2077 German patients with AMI. The questionnaire demonstrated good feasibility with only 1% of the respondents who did not complete the questionnaire at all. Of the respondents 96% answered at least 10 items, which are required to calculate the WHODAS disability score according to the standard scoring rules [5]. So far only one paper has reported on the feasibility of the revised WHODAS 2.0 12-item version. In a sample of the Australian general population 0.2% had missing data in one or more WHODAS 2.0 items administered by interview [11]. However, these results are hardly comparable with our study as they differ regarding the mode of administration (interview versus self-administered) and characteristics of the study population (age, health status, country of origin) [41]. Our study is the first comprehensive report on the feasibility of the 12-item version of the WHODAS 2.0 being self-administered in a sample of patients with AMI with a mean age of 67 years and a high number of persons with poor education. The proportion of 96% of usable questionnaires was comparable or even higher than for other health questionnaires, e.g., the Short-Form 12 Health Survey (SF-12), applied in elderly persons with cardiovascular diseases [42]-[44]. One reason for the higher proportion of usable questionnaires in contrast to other questionnaires is the standard scoring algorithm of the WHODAS 2.0, which allows a substitution of up to two missing items by the mean score of the remaining items. Our finding that the completeness of WHODAS 2.0 decreases with rising age and is inversely associated with poor health state is consistent with a number of previous studies using other questionnaires [41],[43],[45]. However, the knowledge about differential non-response is crucial for study planning and analyses as it may affect statistical power by reducing sample size and cause selection biases or non-differential information biases [46]. Applying the interview version of a questionnaire instead of postal self-administration can improve response rates and reduce missing data [41]. In addition, methods to handle missing data, such as multiple imputation techniques, were recommended in order to quantify potential biases [41],[47].

In terms of the analysis of psychometric properties we showed that the 12-item WHODAS 2.0 fulfilled the assumptions of Rasch modeling. The confirmation of unidimensionality is consistent with Luciano et al. [13] who analysed data of Spanish patients with major depression using exploratory principal component and subsequent confirmatory factor analysis. Sousa et al. [12] examined unidimensionality of the WHODAS 2.0 in elderly people living in seven low- and middle-income countries and showed that principle component analysis gave rise to a one-factor solution in most countries. In the study from Andrews et al. [11] a second-order one-factor solution with six first-order factors was the best-fitting model for the Australian general population.

In our study, the PCM revealed disordered thresholds for nine of the 12 items, whereas Luciano et al. [15] found that all items of the WHODAS 2.0 discriminated well in their study population of patients with a first-time diagnosis of major depression. Difficulties in differentiating between the five response options, which may be particularly pronounced in our elderly and poorly educated population, can be a potential reason for these different results. We also demonstrated that it was possible to achieve a correct order of thresholds by collapsing the response options. Thus, it may be concluded that the WHODAS 2.0 has better discrimination ability in our sample when reducing the number of response options from five to three. Further studies in elderly populations with poor education are required in order to confirm these findings.

The item thresholds resulting from our final PCM covered the whole range of the continuum. Therefore, the items - recoded in three different response options - are appropriate to differentiate between persons across the whole continuum of disability. However, for low levels of disability only a few thresholds are available, thus permitting only rough differentiations between persons' levels of disability. As the WHODAS 2.0 was originally developed for measuring disability in the general population, it can be expected that it differentiates even less there compared to our study population and thus might not be an appropriate instrument to assess disability in very healthy populations. However, the differentiation of disability levels in the healthiest segment of a population is not especially meaningful, as this subgroup of persons is neither relevant for health care planning nor health policy. In addition, if the original coding of items with five response options could be used for model estimation, more thresholds would be estimated and therefore permit a finer distinction of persons' disability levels, likely also in the lowest levels of disability.

Consistent with Luciano et al. [15] no item showed DIF in terms of sex. This means that the WHODAS 2.0 disability score does not overestimate the level of disability in men compared to women or vice versa. Further DIF analyses showed that the items are also not biased by age, education, marital status, presence of comorbidities, overall health status, and smoking status.

Concurrent validity of the 12-item WHODAS 2.0 has previously been tested by Luciano et al. [13] who compared patients with a first major depressive episode who were on sick leave with those who were working and found significant differences regarding their WHODAS 2.0 scores. Our results support the ability of the 12-item WHODAS 2.0 to discriminate between subgroups of AMI patients which are reported to have worse health outcomes than others, namely persons with comorbidities or worse overall health status [21],[48].

To our knowledge, this is the first study that examined feasibility and psychometric properties of the 12-item WHODAS 2.0 in patients with AMI. A strength of our population-based study is the inclusion of a large sample of patients in a defined area and according to defined criteria, with validated AMI, and standardized assessment of demographic and clinical variables. In terms of psychometric analysis, the application of Rasch analysis has a number of advantages compared with classical test theory methods, e.g., its ability to deal with incomplete data, the possibility of testing for DIF in different subgroups, and the interval scale of the resulting metric on which both item difficulty and person ability can be meaningfully compared. Furthermore, parameters of Rasch models generally are neither sample- nor test-dependent, a property which is summarized under the term of specific objectivity [49].

There are some limitations of our study that should be mentioned. The data we based our analyses on solely consist of German patients with AMI. Therefore, the generalization of our results to other patient groups and other settings might be limited. Furthermore, it cannot be excluded that some characteristics (e.g., linguistic and cultural aspects) of the German version of the WHODAS 2.0 could have influenced some of our results.

Conclusions

Our study demonstrated that the 12-item WHODAS 2.0 self-report form is feasible for application in a sample of persons with AMI that was characterized by a high amount of elderly and poorly educated individuals. Rasch analysis revealed that the 12-item WHODAS 2.0 is a nonbiased instrument with respect to sex, age, marital status, education, presence of comorbidities, overall health status, and smoking status. Its items differentiate between persons across the whole continuum of disability. Shortcomings refer to the unordered thresholds of most items in our sample, which could be resolved by collapsing of response categories.