Background

Stress at work is becoming an increasingly relevant issue, with one in six European employees reporting chronic health problems [1]. The resulting costs of stress at work are internationally considered a significant financial burden on society (US$ 221′13 million to 187 billion) [2]. In Switzerland, for example, work-related stress accounts for 24% of total health-related production losses due to absenteeism as well as presenteeism, which corresponds to 3.2% of employees’ average monthly earnings [3]. Work-related stress is defined as ‘a pattern of reactions that occurs when workers are presented with demands or pressures (stressors) that are not matched to their knowledge, abilities and skills and which challenge their ability to cope’ [4, 5].

Health professionals in particular are frequently affected by various stressors at work, such as work-private life conflicts, understaffing, long working hours, high quantitative and emotional demands and reward frustration [6,7,8,9,10]. Stress at work potentially leads to lower job satisfaction and commitment to the organization, and is associated with health professionals’ intention to leave their profession prematurely [11,12,13]. In consequence, work-related stress may exacerbate the issue of workforce shortage of qualified health professionals in several countries [14]. In Switzerland, the healthcare system is also struggling with such a shortage [15].

Assessment tools that capture stressors and consequences of stress at work among health professionals in a reliable and valid way are essential in developing appropriate prevention and intervention strategies. Several studies have been conducted to assess work-related stress and intention to leave among health professionals, such as the European longitudinal Nurses’ Early Exit study [16,17,18] or the RN4CAST [19] study, using selected scales of the Copenhagen Psychosocial Questionnaire (COPSOQ) to cover relevant topics among health professionals. The COPSOQ developed by Kristensen [20] is one of the most widely used instruments and has been translated into more than 25 languages [21,22,23]. The COPSOQ is a self-report questionnaire that assesses psychosocial stressors and stress reactions as well as individual health and well-being [5], and has the advantage of a scientifically grounded theoretical background [24]. The COPSOQ is available in a short, middle or long version and is designed for workplace surveys, analytic research and international comparisons [5, 20, 22]. The scales and single items included in the COPSOQ, are used to assess various stressors at work, such as demands (e.g. quantitative demands, sensorial demands), work organisation and content (e.g. influence at work, opportunities for development, meaning of work), social relations and leadership (e.g. predictability of work, role clarity, role conflicts, quality of leadership, social support at work), the person-work interface (e.g. job insecurity) as well as the home-work interface (e.g. work-private life conflict, demarcation). In addition, scales assessing employees’ stress reaction (e.g. behavioural or cognitive stress symptoms) and possible long-term consequences of stress at work (e.g. burnout-symptoms) are included [22].

The COPSOQ has already been used in the healthcare sector, translated and validated in German, French and Italian and tested in previous studies [17, 25,26,27,28]. The current version, number 3, of COPSOQ developed by the International COPSOQ Network [29] consists of so-called core items that are mandatory in any national version and further items that can be added. Thus, every national version differs in these further questions. Consequently, since the available translated versions have been adapted to the cultural conditions of the country for which they were designed and differ greatly in terms of topics and item selection, comparable French, Italian and German versions of the questionnaire for multilingual studies are currently lacking. As an outlook for further developments of the questionnaire, the COPSOQ international network strives for international comparability and calls to examine validity across countries [25]. A comparable version in German, French and Italian is especially important for countries with these national languages, such as Switzerland (66% German-speaking, 23% French-speaking, 8% Italian-speaking). In multilingual samples like Switzerland, cultural adaptation is important to understand if the linguistic groups interpret and understand the items in the same way. Therefore, comparable items / scales are essential [30].

This study aims to present selected scales and single items from the German COPSOQ Version translated into French and Italian and to analyse their psychometric properties in a large and heterogeneous sample of health professionals in Switzerland.

Methods

Design

This study was conducted in two phases. First, the selected scales and single items from the COPSOQ were translated from German into French/Italian, culturally adapted and tested using ‘cognitive debriefing’ in interviews.

Second, the translated scales and single items were psychometrically validated in a large group of health professionals as part of the STRAIN project (work-related stress among health professionals in Switzerland). Briefly, STRAIN is an ongoing cluster randomized controlled trial (ClinicalTrials.gov identifier: NCT03508596) that is based on three measurements: the baseline T0, the first measure T1 and second measure T2. The results presented in this study are based on the cross-sectional data from the STRAIN baseline measurement T0 (September 2017 to March 2018) and the first measurement T1 (January to May 2019). Since cases with repeated measurements were identified and removed (e.g. if a person filled out the questionnaire at T0 and T1, the case at T1 was removed) the study is based on cross-sectional data only. Further details regarding the STRAIN project are published in Peter, Schols [31].

Recruitment and study sample

Health organisations were randomly selected from all hospitals, nursing homes, and home care organisations registered by the Swiss Federal Statistical Office in 2016. These included Swiss acute care, rehabilitation and psychiatric hospitals, nursing homes and home care organizations from all language regions of Switzerland. A total of 100 hospitals, 100 nursing homes, and 100 home care organisations were randomly selected from the German, French, and Italian-speaking regions of Switzerland using a web-based randomization approach [32] also ensuring a geographically representative sample for Switzerland. Overly small (average number of beds < 20, < 7 employees) or specialised organisations (e.g. in gynaecology or neonatology) were excluded.

Selected organisations were invited to participate and provided with information about the study. A total of 36 acute care, rehabilitation or psychiatric hospitals (23 German-speaking, 12 French-speaking, 1 Italian-speaking), 86 nursing homes (56 German-speaking, 24 French-speaking, 6 Italian-speaking) and 41 home care organisations (36 German-speaking, 3 French-speaking, 2 Italian-speaking) agreed to take part in the study [31].

Content and use of the questionnaire

Using the German COPSOQ versions from 2005 and the extended German standard version 2017 ([26]; Nübling et al. 2017 [33]), we selected scales for translation and validation that were in previous studies [34] considered relevant regarding the work environment and demands at work in the healthcare sector. Table 1 shows the seven domains and 29 selected COPSOQ scales that were translated and validated for this study. All questions (i.e. items) for the three languages are available in Supplement A. For all scales used in the questionnaire, consent was obtained from the original author for their use. The COPSOQ versions are not under license. The scales we included from COPSOQ revealed satisfactory-good construct validity, criterion validity, diagnostic power and reliability (Cronbach’s alpha 0.64–0.89) in previous studies [22, 25, 26].

Table 1 Domains, scales and number of items per scale in the German, French, and Italian short/modified version of COPSOQ

The item responses are scored on a five-point Likert scale (1 = always, 2 = often, 3 = sometimes, 4 = seldom, 5 = never/hardly ever or 1 = to a very large extent, 2 = to a large extent, 3 = somewhat, 4 = to a small extent, 5 = to a very small extent). The polarity on the Likert scales differ between the scales, e.g. for scales on demands at work high scores indicate higher risk for work-related stress, while for the scales on opportunities for development or influence at work low scores indicate a higher risk for work-related stress. The total scale scores are arrived at based on average item-responses and transformed to a value range from 0 (never/hardly ever or to a very small extent) to 100 (always or to a large extent), taking account of reversed scored items as well. This transformation of items from 1 to 5 to 0-100 is done in most publications using the COPSOQ to allow comparability of results when using different COPSOQ Versions [22]. According to the original author of the COPSOQ [22], scale scores can be calculated if at least half the items are not missing (e.g. for a scale with 5 items, the mean is calculated if at least 3 of the 5 items are completed). No imputation procedure for missing values was performed.

Translation and cultural adaption

Items from selected German-COPSOQ scales were translated and cross-culturally adapted to French and Italian in accordance with established guidelines for scientific translation processes “SPOR Principles of Good Practice” [35]. Figure 1 presents the stages of the translation process. In stage one, all items were independently forward translated by a native French/Italian-speaking health professional and a native French/Italian-speaking professional translator. After translation, the two versions were compared, discussed (peer group stage 1: two first authors and translators native French/Italian-speaking), and a common final version 1 was created. In stage two, the translated items were independently back translated into German by a French/Italian-speaking health professional and a translator, who were native German-speakers. Afterwards, language discrepancies were resolved by discussion (peer group stage 2: two first authors and translators native German-speaking), and a final version 2 was created. If questions arose regarding the comprehensibility of individual items, the original author of the German COPSOQ scale was involved. In a last step, the translated items were tested using ‘cognitive debriefing’ [35], to determine acceptability, understandability and clarity of translation. For this purpose, interviews with 5 native French-speaking and 5 native Italian-speaking health professionals were conducted and all items tested. After those interviews, a few adjustments were made in the translation-team (two first authors, native French/Italian-speaking, and German-speaking translators). Afterwards a final version was created and proofread by a translation agency (Final Version).

Fig. 1
figure 1

Methodological steps of translation and testing

Data collection

For data collection, all health professionals (nurses, midwives, medical-technical, medical-therapeutic professionals, physicians) in the participating organisations were invited to participate. The questionnaire was available in an online and paper version (including a direct reply envelope) in a German, French and Italian Version. The participation was on a voluntary basis for organisations as well as for health professionals and they had the option to choose the version of the questionnaire they preferred (online or paper version).

Psychometric and statistical analysis

Participants’ characteristics and validation statistics for all scales were stratified by language groups. Since not all scales contain a sufficient number of items to calculate all psychometric coefficients (e.g. single-item scales), reliability was calculated only for scales with at least two items [36] and construct validity for scales with at least three items [37]. Reliability was investigated using Cronbach alpha and intraclass correlation coefficients. Although Cronbach alpha is an accurate estimate for two items, it may underestimate true reliability [36]. Floor and ceiling effects were calculated as the proportion of respondents choosing the lowest and highest response options for all items within a scale, adhering to the procedure from comparable studies [23, 38].

Furthermore, we calculated Intra Class Correlations (ICC) (3,1) in accordance with the recommendation by Shrout and Fleiss [39] that ICCs (3,1) be used to measure the consistency of multiple ratings (two-way mixed effects analysis of variance (ANOVA); each subject is measured by a fixed set of items), using the psych package in R [40]. For Cronbach Alpha, values > 0.7 indicate scale suitability, whereby a higher number of items normally results in a higher coefficient [41]. For ICC values, less than 0.4, between 0.4 and 0.59, between 0.60 and 0.74, and greater than 0.75 are indicative of poor, fair, good, and excellent reliability, respectively [42].

Construct validity and associations between latent constructs were estimated using confirmatory factor analysis (CFA) and structural equation modelling using latent variable analysis in R [43, 44]. CFA tests the given theoretical model and defines its measure of quality [45]. Construct validity was estimated a) on scale levels by using single items as indicators, and b) on domain levels by using the mean values of scales as indicators. For the latter we used structural equation modelling to assess the strength of association between the different psychological domains. Standardized loadings/coefficients (β), corresponding standard errors (S.E), and R-squared (amount of scale variance explained by latent variable) are shown. The values for factor loadings were seen as satisfactory above 0.4 [46]. Various measures were used to estimate model fit. A root mean-square error of approximation (RMSEA) below 0.05 was considered good (below 0.08 as acceptable); a Standardized Root Mean Square Residual (SRMR) below 0.08, and comparative fit index (CFI) above 0.95 were considered satisfactory fit [43, 47, 48]. In multilingual studies, comparability of the data from different language versions is crucial. Hence, the assumption that the instrument measures the same psychological construct across language groups was tested. To compare CFA models (on scale levels) across language groups, likelihood ratio tests were conducted [49]. Analyses were performed using R (version 3.5.1) [50].

Results

Study sample description

A total of 12,754 health professionals completed the questionnaire with a mean age of 41.48 years (SD 12.47). A total of 10,738 (84.2%) were German-, 1788 (14.0%) French-, and 228 (1.8%) Italian-speaking. Most of the respondents were female (81%), nurses (58%), and worked in the acute care setting (42.8%). Participants’ characteristics are shown in Supplement B. The percentage of missing values on scale level was between 7 and 13%. Most of the scales had low floor and ceiling effects, except for the scales “unfair behaviour”, “intention to leave the profession” and “intention to leave the organisation”.

Reliability

Table 2 shows the results for reliability of the scales stratified by language group. Scales that include at least two items were considered for calculation. In the German version 20 of the 24 scales with at least two items exceeded the conventional threshold of 0.7 for Cronbach’s alpha, indicating sufficient internal consistency, whereas in the French version 19 and in the Italian version 17 reached the threshold of 0.7 for Cronbach’s alpha. The scales “Quantitative demands”, “Opportunities for development”, “Scope for breaks and holidays”, “Feedback”, and “Demarcation”, failed to show desirable levels for Cronbach’s alpha in some or in all language groups, ranging from 0.39 – 0.68. The vast majority of scales showed fair (0.40 - 0.59) or good (0.60 – 0.74) scale consistency as measured by ICC.

Table 2 Reliability of the German, French, and Italian of the modified COPSOQ scale

Validity

Figure 2 illustrates the mean values (between 0 and 100) on the domain level (demands at work, work organisation & content, social relations & leadership, home-work interface and stress symptoms) as well as scales on job satisfaction, intention to leave (the organisation / the profession) and burnout symptoms. The figure demonstrates that the mean values for the German, French and Italian versions show similar low or high relative tendencies for each dimension/scale.

Fig. 2
figure 2

Graphic comparison of mean values and standard deviation (SD) from the German, French and Italian version. Mean values and SD for demands at work, work organisation & content, social relations & leadership, home-work interface, stress symptoms, job satisfaction, intention to leave, burnout symptoms (all standard deviations are overlapping)

Construct validity on scale level

In Table 3 the results of the CFA for each scale by language using single items as indicators are presented. R-squared showed predominantly satisfactory factor loadings with values higher than 0.40 in all language groups. In Table 4 the corresponding results from the estimate model fit for each scale and language version are presented. The majority of the scales indicated a good to satisfactory fit with an RMSEA below 0.1, SRMR below 0.08 and CFI above 0.95. The scale Social Support at work could not meet any of the criteria in any language versions.

Table 3 Results for the confirmatory factor analysis by scale including loadings, standard errors and variance explained, stratified by language
Table 4 FIT measures of scales by language

Factor invariance

The measurement of invariance tests the psychometric equivalence of the construct across groups. Table 5 presents the findings of the invariance test. The test for factor invariance indicates a variance across the language versions with p-values of < 0.05. For 10 out of 15 scales a significant difference regarding the psychological construct across the language versions is expected. All dimensions included scales, which showed variance across language versions. In particular, the dimensions Work organisation & content as well as Home-work interface comprised solely of scales with variance across the languages.

Table 5 Test of factor invariance (loadings confirmatory factor analysis) across multiple across language groups

Construct validity on dimension level

Figure 3 summarizes the relationships between the dimensions and the assigned scales for the French and Italian versions. Models show that the majority of indicators show strong relationships with its dimensions except for social relations (both languages) and sensorial demands (Italian group). The majority of the latent dimensions for the French version are strongly interrelated ranging from − 0.65 - -0.72 as well as positive relations ranging from 0.68 – 0.89. In the Italian version, half of the latent dimensions show medium interrelations with − 0.34 - -0.49, respectively 0.56, and the other half of the latent dimensions show strong interrelations with − 0.77, respectively 0.79 – 0.9.

Fig. 3
figure 3

Structural equation models on dimension and scale level. Structural equation models using dimensions as latent constructs and scales as indicators in the French (FR, n = 1788) and Italian group (IT, n = 228), respectively

Model fit was acceptable for RMSEA (FR 0.08, IT 0.08), and SRMR (FR 0.07, IT 0.07), respectively. Models did not show a satisfactory fit with regards to CFI (FR 0.82, IT 0.82) in either language.

Discussion

Valid versions of the COPSOQ are already available in the languages German [25, 26], French [27] and Italian [28]. However, for the first time, a questionnaire for measuring stressors and consequences of work-related stress among health professionals is available for multilingual studies in the three languages German, French and Italian which is, to some extent, comparable across those languages. Most of the translated and tested scales showed acceptable to good internal consistency. The CFA tends to verify the underlying theoretical model of Nübling, Stößel [25], which has been already tested for concurrent validity [51]. It also confirms the strong relationships between the dimensions, as well as the low values for the scales social relations and sensorial demands; we therefore underline the proposition to remove or revise those scales [21].

Moreover, the results are comparable to a recently published study in which the latest version of the underlying questionnaire (COPSOQ III) was validated without an Italian version for international comparability [29]. However, there are differences regarding the reliability of some scales. In Burr, Berthelsen [29], the scales Predictability (0.62), Meaning of Work (0.62) and Job Insecurity (0.66) are given a below-threshold value of 0.7, whereas in this study the scales Quantitative Demands (0.56 - 0.62), Opportunities for Development (0.65 - 0.68), Scope for breaks and holidays (0.39 - 0.43), Feedback (0.62 - 0.65) and Demarcation (0.39 - 0.40) were revealed to be unsatisfactory in terms of achieving the threshold. However, the scales for Feedback and Demarcation are no longer included in the COPSOQ III, which makes comparison of those two scales with the study of Burr, Berthelsen [29] impossible and highlights the diversity of the included scales within the national versions. Hence, the scales Feedback and Demarcation can be excluded in accordance with the latest COPSOQ III version. Furthermore, the COPSOQ III has the dimension Control over Working Time included, which consists of 4 items with a Cronbach’s alpha of 0.69 [28]. Two items match with the items of the Scale Scope for breaks and holidays, which was found to have a low reliability in this study as well as the study evaluating the German COPSOQ version [52]. The authors of the COPSOQ German version have acknowledged this issue and stated to observe it in further studies [52]. In the meantime, pending further development of the COPSOQ by the responsible COPSOQ network, researchers must decide in each case when using the current version as to whether international comparability or reliability is prioritised. When deciding for international comparability, it should be noted that the reliability of comparability would be limited.

Furthermore, the data used in the study of Burr, Berthelsen [29] are company-specific and collected across a multitude of branches, whereas in this study the data comes from health professionals working in the healthcare system, and are thus expected to differ to a large extent with regard to the working conditions and occupational culture.

Independently of the language version, short scales were affected by lower reliabilities. This finding might contribute to the discussed dependency of Cronbach’s alpha on the number of items [53]. In addition, some findings imply the evaluation of the scales, whether they should be enriched with additional items or excluded from the questionnaire.

Cultural and regional differences may have led to the different reliability per scale across language versions and therefore to a significant factor variance in 10 out of 15 scales. Although the variances have been demonstrated statistically, the question arises as to their clinical relevance. The differences in the estimates from Table 3 across the language versions aggregated on the scale level could indicate what statistically significant variance can nonetheless be tolerated for comparability across languages. Of the 10 scales with significant factor variance, four showed a difference > 0.1 in the estimates (opportunities for development, influence at work, social support at work, job satisfaction), implying a revision of those scales to enhance comparability across language versions. In particular, the scale social support at work showed unsatisfactory FIT measures with RMSEA > 0.05, SRMR > 0.08 and CFI < 0.95. Unfortunately, FIT measures on scale level of the COPSOQ from other studies are not available for comparison [28]. In this respect, there is a particular need for a revision of this scale in terms of correct translation and fit. In addition, future studies should include FIT measures in the psychometric testing of the COPSOQ. When using the current version, one should not assign too much significance to the results of the scale social support at work. In Switzerland researchers have to deal with a heterogenous population when surveying nationally, due to the different language regions, despite the country’s small size in relation to other countries. It is known that linguistic differences often go hand in hand with cultural differences and therefore should be considered when developing a measurement across languages and/or cultures [54]. Several questionnaires appeared to struggle with invariance across language versions [30]. One reason for the statistical differences across the language versions could be that the French and Italian language regions in Switzerland have higher numbers of foreign health professionals, such as cross-border workers [55], whose evaluation criteria might differ from those of domestic personnel, for example in terms of job insecurity (e.g. migration policy). An analysis of the missings at the item level could indicate cultural issues, which should be addressed in order to enhance comparability.

Moreover, the enormous change in healthcare systems brought about by digitization [56] implies the emergence of new influencing factors from the interaction of health professionals with technology. However, new trends are continuously being monitored by the COPSOQ international network and, are thus being incorporated into the further development of the COPSOQ [29].

Strengths & limitations

Besides a structured and carefully implemented translation process, one strength of the study is the large sample size across all health professions, settings and language regions, which allows a generalization of the findings. This study delivers important information for further research enabling multilingual research in measuring stressors and consequences of stress at work among health professionals in Switzerland. It provides an extensive amount of information on scales, which is expected to be helpful in future research aimed at advancing scale development and choosing appropriate scales. For the first time, language versions of the COPSOQ were comprehensively statistically analysed for their consistent measurement of the underlying construct.

Although the strengths are promising, they must be considered in the context of the limitations, since two-thirds of the scales differ significantly regarding the measured psychological construct in the language versions. In addition, the results presented in this study are limited to the healthcare sector. Therefore, further psychometric testing of the new multilingual COPSOQ Versions in Italian and French should be carried out in other work sectors to further confirm our results. Hence, interpretation of the results across language regions must be made in the context of these differences. The findings could have originated in the bottom or ceiling effects that were identified, which indicate limited discrimination properties of some scales. Moreover, the study included data sets from two measurement periods, which may have led to duplicates, and, in turn, to cases of duplicates remaining undetected due to possible misstatements. Future research should allow to assign two measurement points to one individual, which would enable to conduct an analysis of test-retest reliability. This analysis has been found to be more appropriate for the analysis of the reliability of psychosocial work environment scales [57]. Finally, several scales were measured with single-items or two items; it is thus possible that the construct to be measured was not sufficiently covered by these items.

Conclusions

This article presents the psychometric properties of a trilingual questionnaire that measures stressors and consequences of stress at work among health professionals. The COPSOQ is known as a generic instrument across branches. An adaptation to working conditions in the healthcare sector could optimize the psychometric properties of the instrument. Hence, future investigation to optimize internal and construct validity of some scales and dimensions is needed to improve the questionnaire. The identified variances across language versions imply re-evaluating the questionnaire to determine whether it is biased by cultural factors, which should be identified in advance.