The Work Role Functioning Questionnaire (WRFQ) is a generic role-specific instrument that measures the consequences of functional health status on the ability to accomplish work demands. Specifically, the WRFQ assesses the time (in percentage) in which workers experience difficulties in meeting work demands, such as work scheduling or physical demands, given their physical or emotional health status [1]. As a generic instrument, the WRFQ development was not restricted to a specific disease or occupation. Moreover, the instrument was developed to be used as work outcome measure in different research settings, such as health services, clinical trials, occupational health interventions, or rehabilitation.

The original American version of the WRFQ consists of 27 items and five subscales. The WRFQ has been cross-culturally adapted and was validated in Canadian French [2], Brazilian Portuguese [3], Dutch [4, 5], Spanish [6] and Norwegian and Danish [7]. During the cross-cultural adaptation to Dutch, a new version of the WRFQ 2.0 was developed which incorporates five new items covering additional working conditions encountered in current labor markets, and four scales, namely Work scheduling and output demands, Physical demands, Mental and social demands and Flexibility demands (WRFQ version 2.0 [4, 5, 7]; the respective items can be found in Additional file Table S1). After a cross-cultural adaptation from Dutch to German, we aim to present the measurement properties of the German WRFQ 2.0 version.


Cross-cultural adaptation into German

The adaptation of the Dutch WRFQ 2.0 into German followed the six-stage approach proposed by Beaton et al. [8]. The prefinal version was tested with a sample of 40 individuals (30 patients presenting psychosomatic symptoms, and 10 persons without symptoms), who also participated in cognitive interviews exploring issues such as content validity, wording, or logical structure of the items. Consequently, some items have been slightly adjusted to the German language usage.

Respondents were asked to assess the extent to which they have had difficulties meeting the work demands due to physical or mental health issues in the last 4 weeks (prior to completing the survey).

The 27 items were answered on a five point Likert scale ranging from 0 = difficult all of the time (calculated as 100%), 1 = difficult most of the time, 2 = difficult half of the time (50%), 3 = difficult some of the time, and 4 = difficult none of the time (0%). Each item also has the option ‘does not apply to my job’.

Data analysis

The respective missing values generated by answering ‘does not apply to my job’ were imputed by the hot-deck algorithm in the program ‘r’ for the subsequent analyses.

For scale construction, the items were summed up with IBM SPSS 26, then divided by the number of items, followed by multiplication with 25 to obtain percentages between 0 (difficult all the time) and 100 (difficult none of the time). Thresholds of significance were set at p ≤ .05. Details of the cross-cultural adaption are part of a doctoral thesis [9].

Design and sample

The sample was obtained from volunteers of a custom online panel ( in Germany in 2018. Inclusion criteria for the online survey were aged 18–64, having worked more than 12 hours per week in the last 4 weeks prior to study participation and adequate reading comprehension skills in German. Excluded were individuals on parental leave, retirees, and self-employed. Participants received small monetary incentives (T0: 1.50 €, follow-ups: 1 €).

We targeted a sample size of about 600 respondents for the cross-sectional survey at T0, to have a sufficient number of employees in the subsequent multivariate subgroup analyses. This sample size was considered appropriate for the construct validation by following the rule of thumb of 10 cases per item of the WRFQ, i.e., n = 270, as recommended [10]. To conduct reproducibility and responsiveness analyses, two follow-up measurements were performed at 1 week (T1) and 3 months (T2) after the baseline measurement at T0. For the T1 and T2 follow-up, we targeted the participation of 50 and 100 individuals, respectively. For stable conditions we again controlled the inclusion and exclusion criteria mentioned above. The usability of the online survey was pretested among five employees.

Since the main purpose of the WRFQ is to measure the extent to which workers experience difficulties in meeting the work demands given a certain level of health, it was important to sample employees from different occupational settings. Therefore, an equiproportional quota sampling was defined based on the following three occupational categories: 1. blue-collar workers (e.g. workers in the manufacturing and processing industry, and craft professions), 2. gray-collar workers (e.g. health care, support and medical assistance occupations, service professions in the areas of facility management, caretakers, cleaning and security services, warehouse, and trade), and 3. white-collar workers (e.g., social workers, clerks and other respective professionals working in offices).

Instrument validation

The investigation of the measurement properties of the German WRFQ followed the COSMIN-criteria [11], and consisted of the analysis of the structural, convergent and discriminant validity, floor and ceiling effects, internal consistency, reproducibility, and responsiveness. We aimed to replicate the Dutch validation study with no further development of the instrument. We therefore used the same methods of the working group of Abma et al. [5].

Structural validity

An exploratory factor analysis which was carried out by principal component analysis with eigenvalue criterion and varimax rotation. The factor structure was defined by taking into account items with loadings > 0.4 only [12].

Convergent and discriminant validity

The following constructs and instruments were used for the convergent validity analysis: productivity assessed with the Endicott Work Productivity Scale (EWPS [13];), overall work ability with the single item derived from the Work Ability Index (WAI; ‘Assuming that the highest work ability you have ever had is 10, how would you rate your current work ability?’, 0 = absolutely unable to work to 10 = best work ability [14]), Decision latitude and Job demands with the Job Content Questionnaire (JCQ [15]), and General health with the respective single item derived from the 12-item Short Form Survey of General Health (SF-12) health questionnaire [16]. Convergent validity was determined by assessing the extent to which the strength of the correlations (Pearson or Spearman rho) of the WRFQ with similar constructs agrees with a set of pre-defined hypotheses. High discriminant validity was expected by detecting low correlations with non-related constructs. Correlations were classified as either small (0.15 ≤ r < 0.25), moderate (0.25 ≤ r < 0.35), or large (0.35 ≤ r) [17].

The hypotheses of the convergent validity (no. H1–3) and discriminant validity (no. H4 and H5) analyses were: A high WRFQ total scale value correlates …

  • H1: … with a high work productivity value (EWPS scale; moderately to highly).

  • H2: … with a good self-reported general health value (SF-12 item General health) (moderately).

  • H3: … with a good overall work ability (WAI item) (moderately).

  • H4: … with a high decision latitude (JCQ subscale; lowly).

  • H5: … with low psychological job demands (JCQ subscale; lowly).

Both convergent and discriminant validity measured by the correlation coefficient of Spearman are considered acceptable if at least 75% of the hypotheses are confirmed [10].

Floor and ceiling effects of scales

Floor and ceiling effects of a scale were considered present if more than 15% of the responses were at the lowest or highest attainable scores of the scale, respectively [10]).

Internal consistency

The reliability of the items was analyzed assessing Cronbach’s α, the intraclass correlation coefficient (ICC), and the inter-item and item-to-total correlations of the scales. Cronbach’s α and ICC greater than 0.7 are considered appropriate for group comparisons [18]. Inter-item and item-to-total correlations were considered appropriate if they were included in the intervals 0.2 and 0.8, and 0.3 and 0.9, respectively [19].


The reproducibility was assessed with the ICC, and was considered acceptable at the group and individual level for ICC > 0.7 and ICC > 0.9, respectively [18]). Additionally, the standard error of measurement (SEM) was calculated by SDdiff/√2.


The sensitivity of the instrument to measure changes between T0 and T2 was evaluated by comparing the mean changes of the WRFQ and of the overall work ability (global item). In addition, the responses to two additional items at T2, the so-called global perceived effect (GPE) items, which measure the extent to which respondents perceived changes in their mental and physical work ability since baseline (e.g., ‘to what extent has your work ability changed regarding the mental demands at work in the last 3 months?’, 1 = much better, 5 = much worse) were examined.

The mean change of the WRFQ scores was estimated for the total scale and subscales by calculating the mean differences between T0 und and T2 and the respective standard deviations (SDs). The standardized response mean (SRM; ratio between the mean change score and its SD) was calculated for all scores (WRFQ total and subscales). Furthermore, the WRFQ mean changes were correlated with mean changes of work ability and the respective GPE items by Spearman correlation coefficient rho.

SRM effect size categories were defined as < 0.2 (trivial), ≥0.2- < 0.5 (small), ≥0.5- < 0.8 (moderate) and ≥ .80 (large) [20]. An at least moderate correlation between the WRFQ measurement change and the change of work ability between T0 and T2 was expected, as well as stable responses in a large part of the sample. On the basis of this set of change measures, the following hypothesis was formulated:

  • Hypothesis H6: a) The correlation of the changes in overall work ability and the GPE items on mental and physical work ability is high. The correlation between the WRFQ mean change scores and b) the global perceived effect (GPE) items of work ability and c) the change of the global work ability item between T0 and T2 are at least moderate.


Response rate and sample

At T0, 4.694 participants of the online access panel were addressed. The final sample consisted of 653 employees (response rate 14%; see Additional file Table S2). The sample sizes and response rates at T1 and T2 follow-up were nT1 = 66 (33%), and nT2 = 95 (16%), respectively. No major differences were found concerning age, gender and job type between the T0 and T2 samples.

The respondents at T0 consisted of 239 white-, 194 Gy- and 220 blue-collar workers (36.6, 29.7 and 33.7%, respectively). Nearly half of the sample was female (47.3%); the average age was 43 ± 12 years. Almost a quarter (24.0%) had jobs with shift work and 60.3% participants worked in small or medium-sized companies. Almost two thirds (58.8%) reported excellent/good health and rated their global work ability on average at 8.6 (SD 1.8, range 0–10) (see Additional file Table S3).

Descriptive results of the WRFQ items

Item means ranged from 2.4 (SD 1.2; no. 9 ‘Feel a sense of accomplishment in your work’) to 3.6 (SD 0.8; no. 15 ‘Use hand-held tools or equipment’) (see Additional file Table S4). The option ‘Does not apply to my job’ was answered between 6.0 and 20.4% for the following five items: ‘Lift, carry, or move objects at work weighing more than 5 kg’, ‘Use hand-held tools or equipment’, ‘Ability to concentrate for reading and processing the information’, ‘Speak with people in-person’, and ‘Process incoming information’ (items no. 11, 15, 20, 21, and 25).

Structural validity

The exploratory factor analysis revealed a factor structure based on four subscales, but the factor content of the German version was different from the Dutch version. The subscales Mental and social demands and Flexibility demands described in the Dutch version were identified as one subscale in the German data (Factor 1). The Dutch subscale Work scheduling and output demands, on the other hand, was divided into two subscales in the German data (Factor 2 and 4). The subscale Physical demands could be well replicated (Factor 3) (see Table 1).

Table 1 Factor structure (German version) at T0 vs. factor structure in a Dutch sample reported by Abma et al. [5]

In close reference to the Dutch version, the four subscales derived from the factor analysis were named as follows: WRFQ-F1 Work scheduling and output demands (10 items), WRFQ-F2 Physical demands (5 items), WRFQ-F3 Mental and social demands (7 items), and WRFQ-F4 Flexibility demands (5 items).

Table 2 shows the results of the convergent and discriminant validity analysis. In agreement with hypotheses H1 to H3 (convergent validity), the correlations of the WRFQ total scale and subscales with the EWPS productivity, the SF-12 global health item, and the global work ability item (WAI) were moderately to large. Also the discriminant validity assumed in H4 and H5 (Decision latitude and Psychological job demands) could be confirmed.

Table 2 Correlation results (convergent and discriminant validity; Spearmans’ rho; T0; n = 653)

Floor/ceiling effects, internal consistency and reproducibility

Neither floor nor ceiling effects were detected (Table 3, see columns entitled with T0). The highest proportion reaching the highest attainable scale value of 100 was found for Flexibility demands with 13.5%.

Table 3 Psychometric properties of the German version of the WRFQ and its subscales

The internal consistency was appropriate with Cronbach’s α. The ICC estimates were equal or above the threshold of 0.7. Moreover, the values for the inter-item (between 0.2 and 0.8) and item-to-total correlations (between 0.3 and 0.9) affirmed the internal consistency of the German WRFQ (see again Table 3, T0).

The reproducibility of the instrument at T1 was acceptable at the group level with ICC > 0.8.


Means of WRFQ values at T0 and T2 and mean change scores are also reported in Table 3. The change of the total WRFQ score was − 17.96 (SD 13.36). The highest change was found for the subscale Work scheduling and output demands, followed by Physical demands and Mental and social demands with lower decreases. The values indicate a reduced work function with high effect sizes at T2 (SRM = 1.34 for the total WRFQ score).

The overall assessed current work ability value deteriorated from 8.7 to 8.0 between T0 at T2. The mean change was − 0.63 (SD 1.7), indicating a small difference (SRM = 0.37).

To answer hypothesis H6, we found a weak correlation between the mean change scores of WRFQ and the work ability item between T0 and T1 (rho = 0.19; see Additional file Table S5). This effect was supported by lacking correlations with the subjective assessments of the respondents, namely the GPE items concerning subjective changes in physical and mental work ability (rho ≤0.09 and 0.13), with the exception of a small correlation with subscale Mental and social demands; rho =0.18 and 0.20, respectively).


We evaluated the measurement properties of the cross-culturally adapted from the further developed Dutch version of the Work Role Functioning Questionnaire in a German working population. The translated and adapted instrument shows good structural validity, although the subscales were only replicable to a limited extent compared to the Dutch version. The total WRFQ scale, however, can be seen as an international comparable instrument.

Since the subscale Flexibility demands of the Dutch version could not be replicated in the present study, it seems that there is some semantic overlap between items 16 to 22 (Mental and social demands) and items 23 to 27 (Flexibility demands) in the Dutch version. This might lead respondents to a similar reference frame of interpretation. In addition, this seems to be supported by the fact that the highest subscale correlation with r = 0.77 in the Dutch version were observed between the subscales Mental and social demands and Flexibility demands [7].

Most of the other measurement properties of the German version of WRFQ (internal consistency, reproducibility and floor or ceiling effects) were good. The moderately large correlations between the WRFQ score and Work productivity (EWPS), the Overall work ability and General health items, and the lacking correlations with the two JCQ scales indicate that the WRFQ and those constructs measure related, but not the same construct. Hence, there was evidence of convergent and discriminant validity of the German WRFQ.

The responsiveness of the instrument was not sufficient. This goes in line with previous results of a Dutch and a Spanish working population, which showed only moderate responsiveness [5, 6]. The relatively large mean change of the WRFQ score between T0 and T2 might indicate a significant self-selection mechanism to the T2 sample.

Given the lack of WAI differences between T0 and T2, we do not assume major health deterioration and associated work role functioning reduction in the T2 population at 3 months. However, we cannot state it precisely, as we did not repeat the question of global health at T2. This must be regarded as a study limitation and implicates further research.

Strength and limitations of the study

The major strength of our study is the validation of a generic instrument to assess the health-related work functioning of working people in view of the common and important aspects such as work scheduling and physical, mental as well as social demands at work. A further strength is the validation of an instrument on a working general population sample that was originally developed for people with health problems. This has so far only been done in the Netherlands, Norway and Denmark. A further strength is the responsiveness test of the instrument, which has not often been tested and is an addition to the literature.

Limitations are the restricted representativeness of members of an online access panel for the working population: Not all professions and occupational positions could be mapped in our sample. Online access panels are generally limited by the typically low number of individuals in management positions for example. Thus, further studies are advantageous in these subgroups. A critical discussion of online access panels can be found in Burgess et al. [21]. The restricted knowledge about the health status of the sample is a further limitation.


The German WRFQ is a short, psychometrically valid instrument consisting of 27 items. It can be used in the assessment and monitoring of work functioning of workers of different ages, with different health status and occupations. The WRFQ may be used as a work outcome parameter in interventions aiming at maintaining the functioning at work or employability after return to work.